dworkin / dgd

Dworkin's Game Driver, an object-oriented database management system originally used to run MUDs.
https://www.dworkin.nl/dgd/
GNU Affero General Public License v3.0
103 stars 31 forks source link

lpc_ext_spawn() failing with multiple modules #48

Closed nyankers closed 3 years ago

nyankers commented 3 years ago

I was wondering why JIT didn't run on my server, and found out it's because I'm running another extension that already calls a process.

When running dgd on linux, if multiple modules call lpc_ext_spawn(), it seems that dgd will either fail to run the processes or hang. It seems that the JIT extension will cause it to hang in this scenario, whereas other extensions will simply fail to execute (I'm honestly not sure why the deviation in behavior here).

To confirm this, I created a test extension which simply runs a given executable:

# include "lpc_ext.h"
# include <stdio.h>

int lpc_ext_init(int major, int minor, const char *config)
{
    char buf[1024];
# ifndef WIN32
    snprintf(buf, 1024, "exec %s", config);
# else
    snprintf(buf, 1024, "%s", config);
# endif
    lpc_ext_spawn(buf);
    return 1;
}

And passed it this executable which just confirms that it ran and then sleeps indefinitely (Linux-specific as I don't have a windows machine):

#include <stdio.h>
#include <unistd.h>

int main()
{
    int ch;
    fprintf(stderr, "test\n");
    while (1) {
    pause();
    }
    return 0;
}

Running this extension along with JIT will cause a hang. Running this extension twice just causes the child processes to fail.

My guess is that the Ext:::spawn() function is calling Config::modFinish() despite this function being a cleanup routine. The comment here says that it will close channels with other modules, but actually this is just a clean-up method to close channels with the same module.

I suspect the intention is for module 2's process to close module 1's file descriptors, but since it runs on the main dgd process rather than the child process, it doesn't quite have this effect. There may need to be a change in the extension API to accomplish that.

The fact that Config::modFinish() hangs on linux is a secondary but related problem in rare scenarios where dgd fails to start before calling Config::fdlist(). This doesn't happen when a fatal error causes it to fail, but only esoteric scenarios where it goes through Config::modFinish() for example if you have an executable on the list that isn't a proper extension (no ext_init()) after one which is.

nyankers commented 3 years ago

I wrote some changes to the related extension code to try and make this work, though I'm not sure it's the best solution as I pretty much just disabled waitpid() entirely.

The main change is that the extension side will have the child process call ext_spawn(NULL, NULL) and the dgd side will honor calls of ext_spawn where both parameters are NULL as meaning to call Config::modFinish(). Because waitpid() was disabled, this means that module 2's child process should close file descriptors pointing to module 1's child process.

You can check these changes out at https://github.com/nyankers/dgd/tree/lpc-ext-spawn and https://github.com/nyankers/lpc-ext/tree/lpc-ext-spawn. If you like them, I can submit a pull request with any changes you think should be made as well.

(For example, should this increase the extension minor version? From the dgd side, old extensions should be fine except that they'll potentially continue to have this problem, but if extensions with this change run on versions of dgd without it, the ext_spawn() behavior will be pretty strange...)

dworkin commented 3 years ago

Your analysis is correct. My fixes are similar to yours, but a little cleaner:

dgd: e4e0ef73c7060fc5c46a6d6d23b8ad882e65cbba lpc-ext: 503e3d3

nyankers commented 3 years ago

Haha, that was actually my first idea for resolving this, but I thought my way might not break version compatibility and my insomnia-filled mind thought that was more important for some reason. ^^;

It looks like this resolved the secondary issue too, or at least I'm unable to reproduce it anymore.

Thanks!