MetaCell / geppetto-scidash

Geppetto scidash extension
2 stars 1 forks source link

Investigate scheduled runs stuck on locked status #367

Closed gidili closed 5 years ago

gidili commented 5 years ago

I couldn't make the new test that I tried (see section Failed Test Creation above), so I cloned 3 tests and ran them against the one model. I thought they would have finished in a few seconds since the model is just the Izhikevich model but they still show as running after a few minutes. When you get this you can check (on Spike) and see if they are still running and if so, why.

gidili commented 5 years ago

Those tests are still locked and they provide a way to reproduce eventual issues. Let's see if we can reproduce and investigate what's going on when this happens from the logs.

As background, we purposely tried scheduling dozens of tests at once and even though some of them took a long time on spike (when a lot of threads are active from jupyter notebooks it goes even slower, we should not have any of these issues on dendrite) they never got locked and always completed or failed.

gidili commented 5 years ago

This was due to too many background workers (40, one per core) being spawned by default and using up memory and sometimes causing celery (the background workers manager) to crash and leaving things locked or scheduled. We are limiting the number of background workers (to 5) and this brings down memory consumption a lot and ensure stability.