cr0hn / golismero-legacy

THIS IS A LEGACY VERSION PRESERVED FOR BACKUP, DO NOT USE
http://golismero-project.com
15 stars 4 forks source link

Migration to Celery #228

Open MarioVilas opened 10 years ago

MarioVilas commented 10 years ago

Python's multiprocessing library on Windows is a little fork-happy and it keeps forking new processes all the time. I think this may be the reason we're experiencing such slow times on RPC calls on Windows... maybe multiprocessing is launching a new fork every time we instance a Queue (!!!).

We have no way of knowing for sure without actually looking at the source code, because the internal workings are undocumented. Python sometimes reminds me of Microsoft. We should call ourselves lucky we don't have to whip up IDA (yet). :P

Probably the easiest solution is to migrate to Celery, or at least to the underlying queue implementation of Celery.

MarioVilas commented 10 years ago

untitled

MarioVilas commented 10 years ago

The two ~500Mb processes are the plugins running docutils to generate a large report. Apparently docutils isn't too careful with the memory usage. I made sure it's docutils and not our code - the memory usage remains under 30Mb until docutils is called, then goes up to 500Mb.

MarioVilas commented 10 years ago

The feature-kombu branch contains my experiments using Kombu for communication - it turned out to be slow the way we were using it, so it's not going to be merged. Too bad, it seemed quite stable. But I'd prefer to leave it there for reference, and in case I want to tinker with it again. :)

The feature-snakemq branch was the first implementation of a new communication channel using SnakeMQ, but after merging it a few bugs were found and fixed directly in master, so I'll likely delete that branch soon. SnakeMQ already solves our problems with the multiprocessing module, but introduces a few of its own, that we're working around somewhat well so far.

ZeroMQ could be an interesting alternative to SnakeMQ, but it requires a binary dependency, and it's not a priority right now.

Celery is a bit tricker to integrate - it'd involve pretty much dropping most of our multiprocessing code, and a few changes in the way we think of things. I'm still not 100% convinced about it, but the advantages would be quite important, so I'll have a try at it later on. Probably the feature-kombu branch will prove useful in that.

I'm thinking we should probably leave the SnakeMQ support as default and only switch to Celery for server deployments. All the SnakeMQ code is encapsulated in golismero/messaging/manager.py so in principle it should be easy to use the Strategy pattern here.