Closed unkcpz closed 8 months ago
Is there anything in verdi process report
? Maybe there is a problem with the transport and it is cycling in the exponential backoff mechanism?
No, nothing there.
I guess the daemon must be busy doing something
I get such a conclusion because I start 4 daemons and then stop, there are always two daemon stop timeouts.
Can you stop the daemon and then run verdi devel rabbitmq tasks list
and check that the pks of the processes that seem to be stuck are listed.
Another thing you can do, set verdi config set logging.aiida_loglevel INFO
and then run verdi daemon worker
. It should launch a single worker in the foreground and then you should see messages for each process that it starts running. Please make sure that the processes that seem stuck are logged there.
The processes are in the list verdi devel rabbitmq tasks list
.
I changed the log-level to INFO
and ran verdi daemon worker
, one hour passed and nothing showed up.
That's really weird. How many processes are listed by verdi devel rabbitmq tasks list
?
╰─± verdi devel rabbitmq tasks list | wc -l
206
Is this too much? The time these calcjobs froze happened around 2 days ago around when the CSCS token expired during the weekend. I was expecting this morning when I came back that some of the jobs run remotely should be paused after hitting the maximum iteration of exponential backoff. However, the processes are not all calcjobs run remotely, I also have calcjob that run locally that don't have SSH connection problems are stuck.
Closing this for now since it is unlikely that you can reproduce this exact problem
I had calcjobs in the process list which are finished remotely but the item is kept in the process list and the daemon does not proceed on any of them. I tried
verdi dever rabbitmq tasks analyze --fix
and it revived some processes and it shows "No inconsistencies detected between database and RabbitMQ".I guess the daemon must be busy doing something but I just cannot see any of it and those finished processes are just stuck on running or waiting status forever. I also tried to increase the log level to DEBUG and no new information shows after processes are loaded.