Closed Ecno92 closed 2 years ago
I would be OK with adjusting the default number of threads in dramatiq
to equal CPUS
. Would you like to make that change?
@Bogdanp Thanks for your feedback. I'll provide a change.
@Bogdanp I've made a change. Let me know in case something is not correct. Then I'll follow up with additional changes.
@Ecno92 That looks like the opposite change to what I was thinking. django_dramatiq
shouldn't change. All that should change is these two lines in dramatiq
to default to CPUS
:
@Bogdanp I already was afraid that we were not aligned.
In Python threads do not scale well across multiple cores. In that case it makes sense to increase the amount of processes. However an individual core does not become faster when you increase the (virtual) core count. So I think the amount of threads should remain the same.
Yes, I'm aware of how threads work in Python. However, the default of 8 is arbitrary and so my view is that using CPUS
would be more consistent. Anybody running dramatiq
in production ought to be tweaking these flags anyway, so it doesn't matter in the grand scheme of things. So the options here, as I see them, are to either leave things as-is or to make the change I mentioned.
From my experience the number 8 is quite a fine number. For our workloads it leaves sufficient capacity available for the consumer threads to manage the (delayed) queues. That's why I advice most colleagues to just stick to 8 threads and to scale with processes. In practice this works pretty nice. From my experience going beyond 12 threads means trouble for some of our use cases. However this is specific to our use case. So that makes the number indeed rather arbitrary as somebody else may already hit the same issue with 8 threads on a different machine or with a different workload.
We need a safe number for all of us.
Given that scaling threads can cause issues and that going beyond 1 thread may result in weird behavior already we could consider following the convention of uwsgi, gunicorn and Celery to be single threaded by default. I don't think we can pick a more safe value than this.
Also I would like to stick to having a single process by default as cores do not always scale with memory. So I rather see a low processing of tasks and that the user can increase the amount of processes/threads to match the workload.
If a user thinks the workload scales linear with the amount of CPU then implementing something like --processes $(getconf _NPROCESSORS_ONLN)
in the command starting dramatiq is pretty straightforward.
This suggestion requires both changes to this project and dramatiq itself.
Let me know what you think @Bogdanp
No, I don't think I would want to lower the defaults. And as far as scaling goes, cores^2 (i.e., defaulting both values to CPUS
) shouldn't be a problem on modern operating systems. Depending on the workload, they may or may not cause performance degradation, but that's up to whoever's deploying dramatiq to measure and adjust for their use case and hardware.
As far as "weird behavior" goes, if you know of any particular threading bugs in dramatiq, please report them. Otherwise, that's just FUD. Despite the GIL, multithreading in Python is still useful for IO-bound work. Users who do CPU-bound work can scale the number of threads down to 1.
Then I follow your suggestion here to not make any change as I consider scaling the threads with the amount of CPU's in dramatiq itself not as an improvement.
I'll try to reproduce the issues that we faced in production in an reproducible example. If it reproduces then I'll open another issue.
When launching Dramatiq without Django it launches by default with:
processes equal to CPU cores 8 threads.
https://dramatiq.io/guide.html#workers https://github.com/Bogdanp/dramatiq/blob/master/dramatiq/worker.py#L72 https://github.com/Bogdanp/dramatiq/blob/e8bc14d4c3bd4a5105d379addb2cacebff4aba4b/dramatiq/cli.py#L158
However in case of django_dramatiq it launches by default with:
processes equal to CPU cores threads equal to CPU cores.
https://github.com/Bogdanp/django_dramatiq/blob/master/django_dramatiq/management/commands/rundramatiq.py#L41
I think it would be good to align those defaults. However there may be a good reason to have different defaults.
What would be a good way forward?