Open greschd opened 6 years ago
The minimum amount of work for 1.0.0
should be to at least to see if we can add a warning if a user launches a process and the available daemon worker slots are below a certain percentage/number. Not sure where this should be triggered. The most logical point would be in the launchers, but that means that anytime this is called some sort of query to all daemon workers has to be performed to get the number of open slots, if this is even possible, which might be quite slot.
Alternative which is less accurate but more feasible is to simply get number of workers and multiply with prefetch count global constant and then compare with active number of processes. Actually, just run this only when calling verdi process list
and issue the warning at the bottom, just like when the daemon is not running for example. Add instructions with what to do, verdi daemon incr
but maybe also link to documentation explaining the problem. Suggested threshold 90%
I pushed an implementation of this warning fix to a branch in my fork.
What we learned during this process is that the query to generate the table in verdi process list
can be quite costly (~ 18 secs on @sphuber 's 10.5 million node DB). The problem with piggy-backing this query to get the number of active processes is that really one needs to do a second query for only the active processes. This is because the user can apply any filters they want to the output of verdi process list
but we want to guarantee that the warning will be printed in all cases (except for raw output). 30+ secs to get a response is excessive. The reason for this performance is likely related to filtering on process state, which is stored as an attribute.
A second problem is that getting a CircusClient in order to obtain the number of workers is also quite slow (about 1 second).
A final problem is what to tell the user to do when the warning fires. Currently it suggests that they should increase the number of workers. However, we are not sure what the upper limit for processes should be and what is feasible under a typical OS and system setup. In this sense, perhaps giving this advice blindly is a bad idea.
We need to think about whether printing a warning is the right course of action, for now, and make plans to fix the cause of the problem.
Thanks @ConradJohnston for the improvement!
@sphuber Should we really close this issue though, or was that just done automatically/accidentally because the PR mentions this issue?
To add to my initial suggestions, I think one could also implement some logic where a process gets dropped (and added to the end of the queue) when it has been waiting for a specific number of "worker cycles" or specific time.
Maybe my tasks were a bit excessive in the number of processes created, but I've found launching a large number of workers to be problematic in terms of CPU load -- for workers which end up not doing much.
Yeah, it was closed accidentally. The heavy CPU load, was that on the alphas of aiida-core
. There were some inefficiencies that have been solved that could cause them too spin too often, i.e. the polling fallback was firing all the time because the default was wrongly set to 0. The actual solution to this problem will probably be quite complicated, as any typical solution might add other inefficiencies and something we might want to do, would require functionality from RabbitMQ for which it was not designed and so won't be able to provide.
Yeah that was on alpha, glad to hear that should now be fixed.
True, the scheduling definitely isn't an easy fix. Another option (maybe as temporary fix) would also be to have a watcher thread that dynamically scales the workers (as an optional setting).
The proper solution to this will require significant changes to the way RabbitMQ is used. Therefore I am putting this in v2.0.0
. For now, the "fix" implemented #2774 should give some breathing room, but the problem is not fixed
I'm thinking about how to solve this in a more general way, for now I'm collecting information here:
https://github.com/aiidateam/aiida-core/wiki/Scaling-of-AiiDA-workers-and-slots
because it is easier than re-reading the full conversaion each time.
Should we continue discussing here, and you'll collect it there? Or edit there directly?
The problem with this is that the parent would then not be able to receive messages and be unable to be killed, paused, etc.
I guess when the total number of processes is higher than the number of "slots" = workers * processes per worker, that is kind of unavoidable, right? Is my understanding correct that the message would still get queued and they receive it once some other worker picks them up?
I think some sort of rotation / scheduling is needed here. Some comments:
Since this seems like a classical scheduling problem, we should definitely consult existing solutions / algorithms.
Discussing here is good, I was just thinking of putting summary sutff in the wiki.
@gerschd, what you say is good except we still would need to deal with the problem of how to make sure processes that are 'pulled off' can still deal with incoming messages. Additionally we have to be careful that any scheduler like implementation doesn't end up in a situation where, say, a process is waiting for transport but is pulled off before it gets it because it looks like it's doing nothing. In the worse case processes could never end up getting their transport because they are always pulled off before they get it.
summary below (this used to be in the wiki, but is best kept here):
The AiiDA daemon runs processes by launching workers which each have a finite number of slots representing the number of simultaneous (in the coroutine sense) processes that they can run. Each worker maintains a single connection to the database.
This choice of implementation was guided by the following constraints:
Django
or sqlalchemy
and indeed Django
uses a separate connection per thread.There are a number of problems with the current solution, some of which have been discussed in #1397. Specifically:
When many processes are created, the workers will clog up and keep checking the status of a few processes, without actually running the processes that it depends on.
This is due to the prefetch limit.
Possible ways to solve this: