I started a regional (multi-origin) analysis with a large number of origins (about 250k). Once one worker started up, I duplicated it with auto-shutdown turned off on the duplicate instance.
Checking progress via the analyst-server web interface much later, I see that the job is almost complete with all but 8 tasks reported complete. The job is stuck at this point. The auto-shutdown worker has shut itself down (presumably for lack of work to do) and the long-lived worker is still up with zero CPU load. The analyst-server interface is still responsive, handles single-origin requests well, and in fact someone else is running another regional job in the background.
Weirdly, sometimes we have double occurrences from two different threads in the broker logs:
04:26:57.057 [grizzly-nio-kernel(3) SelectorRunner] INFO o.o.analyst.broker.Broker - 73024 undelivered, of which 0 high-priority
04:26:57.057 [grizzly-nio-kernel(3) SelectorRunner] INFO o.o.analyst.broker.Broker - 0 producers waiting, 0 consumers waiting
04:26:57.057 [grizzly-nio-kernel(3) SelectorRunner] INFO o.o.analyst.broker.Broker - 4 total workers
04:26:57.058 [main] INFO o.o.analyst.broker.Broker - 73024 undelivered, of which 0 high-priority
04:26:57.058 [main] INFO o.o.analyst.broker.Broker - 0 producers waiting, 1 consumers waiting
04:26:57.058 [main] INFO o.o.analyst.broker.Broker - 4 total workers
And sometimes we only see one occurrence from the [main] thread.
I think we need less regular logging of these summary statistics, and more occasional logging of the task queues broken down by job etc.
I started a regional (multi-origin) analysis with a large number of origins (about 250k). Once one worker started up, I duplicated it with auto-shutdown turned off on the duplicate instance.
Checking progress via the analyst-server web interface much later, I see that the job is almost complete with all but 8 tasks reported complete. The job is stuck at this point. The auto-shutdown worker has shut itself down (presumably for lack of work to do) and the long-lived worker is still up with zero CPU load. The analyst-server interface is still responsive, handles single-origin requests well, and in fact someone else is running another regional job in the background.
Weirdly, sometimes we have double occurrences from two different threads in the broker logs:
And sometimes we only see one occurrence from the [main] thread.
I think we need less regular logging of these summary statistics, and more occasional logging of the task queues broken down by job etc.