Open justhalf opened 11 years ago
After further inspection by printing out the connection status in connection_manager.py, line 128, I found out that the workers that seemed to be not working were actually waiting for any activity in the connection to the server.
So this is probably a bug in gearmand. I've posted the issue there (https://bugs.launchpad.net/gearmand/+bug/1220168)
I want to know if this issue has been resolved. my environment is gearmand 1.1.12, multiple job server, persistent queue type is mysql. I encountered the same problem, when the program runs for a week, there will be a part of the job retention in mysql queue, the strange thing is, some job but also to work fine
I encountered the same problem too, when using multiple gearmand servers and having the workers connect to both of them, somehow (intermittently) some of the workers will just accept jobs from one of the servers only.
If gearman client send jobs to mutiple gearmand servers, sometimes It send more jobs to someone, if workers just accept jobs from one of the servers only. If job run a longtime, Result in that, some workers can't get job, and many jobs in one gearmand server's queue.
I have resolved this problem, gather some code from the branches. Add some code to grab job from servers.
My fork is https://github.com/yunjianfei/python-gearman
I found that when using multiple gearmand servers and having the workers connect to both of them, somehow (intermittently) some of the workers will just accept jobs from one of the servers only (i.e., it seems to lose connection to the other server or blocked in waiting for jobs in one server only and thus doesn't fetch jobs from there anymore). This causes the jobs from that server to be executed only by the workers which are still connected to that server. I experienced this in a 16-core Ubuntu machine.
This is somehow related to https://github.com/Yelp/python-gearman/issues/17 although a bit different.
Configuration:
Try the minimal reproducing code below to test.
test.bash
gearman_client.py
gearman_worker.py
My last lines of output in worker's console:
As you can see, only workers 2, 4, and 5 are processing jobs from 4731, the others just don't fetch jobs from there anymore.