Add RPC timeout, fix freezing so it doesn't assume pyinstaller in the env is 32 bit, and facilities for the server to shut itself down.

fake-name commented 5 years ago

Basically the title:

82b3a43 adds a timeout to RPC calls, so you can handle contexts where the remote goes away or gets stuck forever.
0371582 Fixes the fact that freeze_server32.py assumes that invoking PyInstaller in the shell will point to the 32 bit pyinstaller binary. I literally have python32 for this project, and nothing else, and it's not on the PATH, so I wound up accidentally building a 64-bit server32 (which unsurprisingly didn't work at all)
Rather then just invoking pyinstaller, it now looks at how the script file itself was invoked, and uses that to then execute <python cmd> -m PyInstaller .... This will probably help for other oddbal environments too.
3ec1c33 adds a new exception (ServerExit), which the server can raise and will cause the server to gracefully shutdown. This is somewhat nicer then the brute-force _proc.terminate() that's currently relied upon, and should let the server do any cleanup too.
I needed this because my server class fires up some worker threads, and terminate would just orphan them.

The various commits should be cherry-pickable, if needed.

jborbely commented 5 years ago

Thanks for the pull request.

PyInstaller

This is a good catch. I do not have any further comments.

RPC timeout

Using a timeout for a request is something that I have not implemented because I was not sure of the best way to do it. Since the client-server connection can currently only be via localhost (the pickled file that is shared between the processes is on the local hard drive) you can disconnect your computer from the network while the 32-bit server calls the function in the 32-bit library and you will still get the response on the client side. In addition, not all tasks performed by the 32-bit library will have the same (or predictable) runtime. For example, if I pass a 10x10 matrix to some 32-bit function then a 1000x1000 matrix I know that the time required to compute the result will increase, but by how much? Therefore, when I create my client I would be required to think about what is the maximum runtime expected for any set of input arguments for all functions that the 32-bit library provides and make the rpc_timeout value to be large enough so that a timeout is raised for a legitimate reason.

For a long time I've been contemplating whether each request should have its own timeout option rather than using a global timeout value.

If an optional rpc_timeout parameter is made available in Client64.__init__ then I think that the default value should be None.

Could you explain what problem you are trying to solve by adding the rpc_timeout parameter (again considering host=localhost is the only hostname that works at the moment) and how this problem occurred in your implementation? In my opinion, if the 32-bit library gets stuck forever then I think that the Server32 subclass should be aware of the library's limitations and raise an exception if the library does not return in a uniquely-defined time period and I don't see why localhost would go away (has it for you?).

ServerExit and run()

I recently changed the way that the server shuts down by using the self._proc.terminate() and os.kill(self._meta32['pid'], 9) calls. I have not published a release of the code that uses this change because I have not felt comfortable with my change. I was thinking about the impacts that this would cause and, thanks to you, you have provided evidence of the impacts.

It seems to me that the only way that your run() method gets called is by the following

from msl.loadlib import Server32

class MyServer(Server32):
    def __init__(self, host, port, quiet, **kwargs):
        super(MyServer, self).__init__('whatever', 'cdll', host, port, quiet)
        self.run()

because the serve_forever() method gets called in start_server32.py immediately after the server is instantiated. This means that there would now be 2 ways to call serve_forever() and each way has different error handling.

I think that a more flexible way forward is a merging of the old way which gives the server the opportunity to shutdown itself, see 07d2ae4c86490967ef3b99866f39e38e7f984b70 where I used a 'SHUTDOWN_SERVER32' request, and if the server still won't shut down then to kill it using brute force. The brute-force way was needed because one library that I called loaded ActiveX objects that needed to be embedded within a GUI and calling server.shutdown() wasn't successful at shutting down the server and I still had a zombie server running that I just wanted to die.

My suggestions are: (1) there can only be one way that serve_forever() is called and this way should be invisible to the end user and (2) there can only be one way to properly shut down the server and it should be by calling Client64.shutdown_server32().

If you run your code against v0.5 (not from the master branch) do your worker threads stop properly? If your worker threads stop and my ActiveX library would shutdown then I think that things would be better in general.

fake-name commented 5 years ago

Could you explain what problem you are trying to solve by adding the rpc_timeout parameter (again considering host=localhost is the only hostname that works at the moment) and how this problem occurred in your implementation? In my opinion, if the 32-bit library gets stuck forever then I think that the Server32 subclass should be aware of the library's limitations and raise an exception if the library does not return in a uniquely-defined time period and I don't see why localhost would go away (has it for you?).

I'm basically dealing with a thin wrapper around a really crappy vendor DLL that talks to some custom hardware. The hardware semi-regularly wedges and the DLL calls wind up blocking forever somewhere in a libusb-based kernel driver, that becomes literally impossible to kill without physically disconnecting the USB interface (which causes the kernel driver call to finish).

A timeout is a crappy hackaround around crappier software.

Doing the timeout in my wrapper file would involve firing up a separate thread that can somehow fire timeout events into the main thread, or proxy the calls further, and I can't think of a way to implement that without a bunch more work then just letting the RPC interface have timeouts requires.

It seems to me that the only way that your run() method gets called is by the following

Gah, I patched start_server32.py to invoke run() instead of serve_forever(), but dropped it during the PR process. I can dig it up monday (when I'm back at work), or you can just cherry-pick 0371582.

Really, the only major issue here was the pyinstaller version. The others were more of a personal design preference thing, and I recognize I have certain semantic approaches that can seem oddball.

jborbely commented 5 years ago

I will merge your PyInstaller fix. Thanks!

I think that a rpc_timeout kwarg is okay but the default value should be None. I would also slightly modify the docstring and add unittests. I will make these changes.

I'd like to pursue a different way to shut down the server than I currently use and that was done in the latest release so that it solves your orphan threads and my ActiveX problem with the hope that the implementation covers more use cases by default. Once I push something to master we could do a few iterations to modify this part of the code (new PR) to improve the shutdown issues for you and me.

jborbely commented 5 years ago

I have merged your PyInstaller fix, added a rpc_timeout kwarg and allowed the server to gracefully shutdown (doing a kill as a last resort). All changes have been pushed to master.

If you have the time to see if your worker threads terminate properly you could update from master and then you'd have to re-freeze the server (I only push the 32-bit server's to github before a release, so the server32-* executables are out of date on master). You shouldn't have to raise ServerExit anymore (on my branch the ServerExit exception does not exist). Once you call client.shutdown_server32(), or the __del__ method is invoked, the server (hopefully) shuts down gracefully behind the scenes for your application.

Thanks for your contribution and suggestions. Please open a new PR if you have any further comments.

fake-name commented 5 years ago

Awesome!

One other (really minor) comment would be that noting how you re-freeze the server somewhere in the docs would be nice.

jborbely commented 5 years ago

Sounds like a good idea. I'll create a "re-freeze" section in the toctree.

MSLNZ / msl-loadlib

Add RPC timeout, fix freezing so it doesn't assume pyinstaller in the env is 32 bit, and facilities for the server to shut itself down. #18