Open cstamas opened 6 years ago
Hi, I think that in case of kill -9 we won't be able to do anything with that process anyway... Do you have some propositions what to do in such situation?
What we do in our ITs: the spawned wrapper process and the spawning process "keep in touch" via Pings sent over TCP port. Basically, if "pong" never comes back, or port is dead, you can be sure (the spawner) that spawned process is dead. Similarly, the STOP is also implemented in such manner, is sent by spawner to spawned process, that performs then a clean shut down of whatever server/app it is wrapping.
Similar logic may be used in spawned process: if nobody sends ping, there is noone to send pong responses, then spawner died off, and the process itself should go away.
These are most interesting on CI uses, where accumulated dangling processes may suffocate the machine by doing nothing (just remain active, as nothing shuts them down).
Ok, but two questions: 1) why your jvm is killed in this way (I assume kill -9, otherwise ES will be killed by shutdown hook)? 2) if your jvm can be killed with kill -9, then what happens when wrapper process is killed in same way?
This is a problem in IDEs in debug mode, as well -- usually when you hit terminate/stop (e.g. Eclipse). Many of them don't execute the JVM shutdown hooks, leaving spawned processes hanging.
One possible solution I've thought of is that Embedded ES could provide an accessor for the PID in pl.allegro.tech.embeddedelasticsearch.ElasticServer
. Users could then write the PID to file and flush them when the program cleanly exits. If there's an unclean exit then the PIDs would still be hanging around and we could manually kill them out at next startup.
I've used this technique for some test-related stuff I'm working on (using reflection to get the PID). It's not beautiful, but it is very simple and works effectively.
In case when failsafe crashes (JVM forcefully exits), the ES child process remains running and nothing cleans it up. This is problematic, as in case of CI build, it would occupy resources, even if ports are randomised.