Start worker processes using spawn method instead of fork

romank0 commented 10 months ago

This PR addresses several issues:

"fork" method to spawn processes is not safe if there are more then one thread in the process. I faced this issue when the django application uses background threads.
process restart happens only when the child listener process get to the LISTEN loop. If the error happens before that (e.g. when the notifications stored in DB are processed by the child process using --recover option) the listener process finishes and the main process does not restart it. In general this master-worker handling is very limited in the current implementation and IMHO it is better to use something that does this better e.g. supervisord.
in some cases the automatic restart is not needed (e.g. when the process is run in the k8s deployment). Automatic restart creates waiting processes in this case.
when the error happens in the listener and the process is restarted not all original arguments are propagated to the process being started. This causes a race condition - in case of one process there is a short timespan when notifications are not processed (as in they are just stored into DB) and then recovery is delayed till the next restart of the main processes.

In this MR several things are implemented:

ability to use "spawn" process start method instead of "fork". "spawn" method is used as a safe default. It is possible to use "fork" via --worker-start-method fork option if the application does not start any additional threads.
ability to start only the single listener worker process (--worker option). In this case --processes cannot be used and a single process is started.
automatic process restart may be disabled via --no-restart-on-failure option.

The combination of --worker and --no-restart-on-failure allows to use the listener in k8s deployments.

PaulGilmartin commented 9 months ago

This PR addresses several issues:

"fork" method to spawn processes is not safe if there are more then one threds in the process. I faced this issue when the django application uses background threads.

process restart happens only when the child listener process get to the LISTEN loop. If the error happens before that (e.g. when the notifications stored in DB are processed by the child process using --recover option) the listener process finishes and the main process does not restart it. In general this master-worker handling is very limited in the current implementation and IMHO it is better to use something that does this better e.g. supervisord.

in some cases the automatic restart is not needed (e.g. when the process is run in the k8s deployment). Automatic restart creates waiting processes in this case.

when the error happens in the listener and the process is restarted not all original arguments are propagated to the process being started. This causes a race condition - in case of one process there is a short timespan when notifications are not processed (as in they are just stored into DB) and then recovery is delayed till the next restart of the main processes.

In this MR several things are implemented:

ability to use "spawn" process start method instead of "fork". "spawn" method is used as a safe default. It is possible to use "fork" via --worker-start-method fork option if the application does not start any additional threads.

ability to start only the single listener worker process (--worker option). In this case --processes cannot be used and a single process is started.

automatic process restart may be disabled via --no-restart-on-failure option.

The combination of --worker and --no-restart-on-failure allows to use the listener in k8s deployments.

Thanks for the PR, I will try to get through these soon. Can I ask - what are you using in terms of set up when contributing? Are you using the Opus10 recommended way (https://github.com/Opus10/django-pgtrigger/blob/3c2669bfb97500fc53edeefa981fa74da5412067/docs/contributing.md) or something else? I'm having issues with the docker setup recently.

romank0 commented 9 months ago

I'm not using the docker setup. It mounts user home to the container (-v ~/:/home/circleci option) and this not secure as I'm not sure the image that is used can be trusted and for example my ssh keys will not leak this way.

I tried to use the steps from contributing.md initially but the setup is not complete there as I've reported in https://github.com/Opus10/django-pgpubsub/issues/30.

So now for development (TDD cycle) I'm using poetry (DATABASE_URL=postgres://postgres:123@localhost:5432/postgres poetry run pytest).

Then before a push I use tox to run a subset of tests for different envs. The command is DATABASE_URL=postgres://postgres:123@localhost:5432/postgres tox but I don't have e.g. python3.7 installed so for that particular tox env tests fail for me:

  clean: commands succeeded
ERROR:  py37-django22: InterpreterNotFound: python3.7
ERROR:  py37-django32: InterpreterNotFound: python3.7
  38-django22: commands succeeded
  38-django32: commands succeeded
  38-django40: commands succeeded
  39-django22: commands succeeded
  39-django32: commands succeeded
  39-django40: commands succeeded
  310-django22: commands succeeded
  310-django32: commands succeeded
  310-django40: commands succeeded
ERROR:   report: commands failed

Also there's a separate CI that runs tests with the python and django version we use on the forked version of this library. Ideally we would like to get rid of the fork and use just the official release but many MRs with features we use are not merged yet.

PaulGilmartin commented 9 months ago

Thanks for the info! Yeah sorry about not getting to the PRs, I have a lot of stuff going on right now with moving house. I will get to them soon though :) I'm currently trying to get my env working so I can review/make PRs, but having some issues getting it going, hence why I was asking.

PaulGilmartin commented 9 months ago

@romank0 This PR allows me to run things locally as I would like (including getting docker working on Pycharm). If I committed it would it disrupt your development workflow at all?

PaulGilmartin commented 9 months ago

@romank0 PR looks great - two minor comments

PaulGilmartin / django-pgpubsub

Start worker processes using spawn method instead of fork #70