Closed bzz closed 6 years ago
Update: same happens with just 3 clients instead of 30, it only takes longer, ~30min to reproduce.
Relevant issue exists in Engine src-d/engine#196 this one is just about zombie processes in bblfshd container.
Noticed two new errors in logs, that are not posted above
time="2017-11-16T12:20:44Z" level=error msg="request processed content 3487 bytes, status Fatal" elapsed=43.828518ms language=python
time="2017-11-16T12:24:51Z" level=error msg="error re-scaling pool: container is not destroyed" language=python
Going to post logs \w debug enabled
Here are debug logs \w 93 processes inside container 93-process-bblfshd.log
Yes, I can reproduce it, thanks for the steps and the (as always in your case) really awesome bug report, @bzz.
The processes are runc
zombie processes that doesn't use resources, so this should not be a performance problem, but it's certainly not pretty having all those zombies until a bblfshd container restart (bblfshd pid is the parent of the zombie herd, as you can see with a ps -l to show the PPID). It's odd because libcontainer some time ago merged a PR that reaped the zombie processes and fixed a similar issue we had. I'll try to update the dependency to the latest version in bblfshd and if that doesn't fix the probem I'll investigate if we're doing something wrong in our process management.
Looks like libcontainer from ~master
avoids this problem (I've let it running for 20 minutes and there isn't a single defunc process). I'll upload a exported docker image of this version of bblfshd for you to test and if you confirm that it works we can close this after the PR. More details on Slack.
After more tests, I still see some defunct driver processes after leaving this test running for a while, but there are like 3-5 after 40 minutes while previously there were hundreds, so while not totally fixed, it's an huge improvement.
Considering that the change was just updating the libcontainer dependency, the problem is surely there.
@juanjux It will take few days for me to get back to this to reproduce it, so we can either re-open or keep it here for a while and see.
Sorry, closed by mistake :/
Ok, so if the maintainer @abeaumont agrees we can merge #138 and close this, feel free to reopen if you find the problem again.
Done
While getting UASTs and filtering for identifiers for python files of a single project using Engine, after 30min I can see 350+ driver processes inside the bblfshd container
Logs in details
Steps to reproduce, using 30 concurrent clients:
and then run
:paste
, paste code below and hitCtrl+D
then, if exec'ed to bblfshd container, one can see number of driver processes growing