Open dmeziere opened 4 months ago
Jobs are assigned to workers in the order they are given to the server (FIFO). However, the task system in libgearman as used by PHP clients is an abstraction above jobs, and it sends these "tasks" as jobs. It sends them all at one time, and it happens to send them LIFO. Refer to the discussion in issue #319.
Basically, if you change how you submit the tasks/jobs in your clients (hint: use doBackground
), you should get FIFO.
Alternatively, you are welcome to contribute a PR which changes the order that tasks are added in libgearman to be FIFO.
I also think you need to add more workers until the rate of jobs you can complete exceeds the rate of jobs that you add. Try doubling or tripling the number of workers you have.
Thank you for this track to explore. Adding more workers, in my case, means adding more physical servers (i already pushed the number of process per machine to a confortable ratio), and the costs will explode. That said, if i can prevent a not-yet-ended import to be pushed again and again by incoming ones, it will be a major upgrade !
When i say "35 PHP workers", i was meaning 7 physical servers each hosting 5 VM using each 3 worker processes.
You may also be running into the round robin problem. If you have multiple functions per worker, the default behavior is to assign all the jobs in one function before sending another function. Try passing --round-robin to gearmand.
On Tue, Jul 9, 2024, 7:29 AM David Mézière @.***> wrote:
Thank you for this track to explore. Adding more workers, in my case, means adding more physical servers (i already pushed the number of process per machine to a confortable ratio), and the costs will explode. That said, if i can prevent a not-yet-ended import to be pushed again and again by incoming ones, it will be a major upgrade !
— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2217889532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YGDW7447V2627GGBUTZLPXVLAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXHA4DSNJTGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Does doBackground() have other behavioural differences with addTask() / runTasks() ? I mean the jobs are executed, but gearadmin can't see them, and it looks like the callbacks are not executed. I use them a lot to generate a Gantt diagram, showing all the jobs in realtime. There nothing works at the monitoring level.
[edit] I now can see the job with gearadmin. The documentation (that is a bit light to my taste) states that all the callbacks handling only works with runTasks(). I really need this behaviour, it is a problem to me.
Just to be clear, the PHP extension is a separate project, and we are not responsible for it (except that it uses libgearman.so under the hood and we are responsible for that). If doBackground does not fit your needs, you are welcome to submit a PR which changes the behavior of libgearman, as mentioned previously.
Can you provide some partial sample client code? If you're already using GearmanClient::do, and not addTask/runTasks, then this is not the LIFO task problem, and I'm very suspicious it's the round-robin problem.
On Wed, Jul 10, 2024 at 8:48 AM Ed Sabol @.***> wrote:
Just to be clear, the PHP extension is a separate project, and we are not responsible for it (except that it uses libgearman.so under the hood and we are responsible for that). If doBackground does not fit your needs, you are welcome to submit a PR which changes the behavior of libgearman, as mentioned previously.
— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2220885395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YHJ4UT4R426V22ZGULZLVJWJAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQHA4DKMZZGU . You are receiving this because you commented.Message ID: @.***>
@esabol I am not blaming anyone or anything. I love Gearman ! I am just trying to understand and locate where my problem is, and to find the cheapest solution to it. Believe me, if i could provide any quality code in C / Boost, i would be proud to contribute, if it was nessessary. The only thing i said is that the Gearman documentation on the PHP website (that i understand is not gearmand related) could be enhanced.
@SpamapS I am not using GearmanClient::do. I experienced it this week thanks to your help on this issue, but i did not go very far because i use a lot the callbacks and communication provided by GeamanClient tasks to manage my jobs. I achieved running my jobs with GearmanClient::do, but without any feedback of course. I have a second method, also, but it is nominative (one method per "workshop" (a group of workers handled by a PHP master process in my project) used to warmup an import, before running the real jobs, that have the same function name for all the farm.
The callback stuff isn't hard to do with GermanClient::do .. doBackground was a problem because you never get results from it.
On Fri, Jul 12, 2024 at 3:02 AM David Mézière @.***> wrote:
@SpamapS https://github.com/SpamapS I am not using GearmanClient::do. I experienced it this week thanks to your help on this issue, but i did not go very far because i use a lot the callbacks and communication provided by GeamanClient tasks to manage my jobs. I achieved running my jobs with GearmanClient::do, but without any feedback of course. I have a second method, also, but it is nominative (one method per "workshop" (a group of workers handled by a PHP master process in my project) used to warmup an import, before running the real jobs, that have the same function name for all the farm.
— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2225242569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YFXIG7SDNAG4K5XYVLZL6STZAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRVGI2DENJWHE . You are receiving this because you were mentioned.Message ID: @.***>
I'm kind of wondering if this is actually a problem with the PHP extension after all. The implementation for the addTask
method in the PHP extension has a comment that says "prepend task to list of tasks on client obj", which would seem to imply that it's the one that's setting the order to LIFO instead of FIFO.
gearman_client_add_task_handler
(https://www.php.net/manual/en/gearmanclient.addtask.php):
https://github.com/php/pecl-networking-gearman/blob/a52052cdd712a95091ce926be3bcdca41c730696/php_gearman_client.c#L736
No, that's a bit of a ruse, that's just how it's managing its own data structures. It happens here:
https://github.com/gearman/gearmand/blob/master/libgearman/packet.cc#L190-L199
Tasks are stored in the universal here until run_tasks is run. For whatever reason, they decided to prepend rather than append. As we've said before, if you want to use tasks FIFO, then you have to add them in reverse order.
The docs don't define this order, but I don't think we could change it without most likely breaking some folks.
We could probably add a new universal option to reverse the order, and if nothing else, maybe we should document that they are LIFO.
I think undocumented behavior is subject to change, personally, and I really doubt anyone wants LIFO. Just my two cents.
If I may add weight to the FIFO behaviour, the problem is not when one adds a bunch on jobs in an empty queue. He can, like previously said, reverse the order of submition if desired. But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.
That's most likely because the task system is not meant to be a long lived queue itself. It was always meant to be used to farm out the work from a single request across multiple workers and then wait for all of that work. In its original intended use case you should be flushing this accidentally LIFO queue with run_tasks as soon as you can and then waiting for all of them before sending more. If things are backing up in it, one, they're not safe, that's in-memory, but that's also just not the intended purpose of tasks.
On Mon, Jul 29, 2024, 7:25 AM David Mézière @.***> wrote:
If I may add weight to the FIFO behaviour, the problem is not when one adds a bunch on jobs in an empty queue. He can, like previously said, reverse the order of submition if desired. But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.
— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2256090569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YFS5RVCWPTNZFVSMELZOZGGDAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJWGA4TANJWHE . You are receiving this because you were mentioned.Message ID: @.***>
I do, BTW, agree that undocumented behavior is fair game. I am just worried about how long it's been de-facto behavior. It may actually be beneficial in some cases.
On Sun, Jul 28, 2024 at 7:56 PM Ed Sabol @.***> wrote:
I think undocumented behavior is subject to change, personally, and I really doubt anyone wants LIFO. Just my two cents.
— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2254847025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YBSBIDFK7RR7UV5ZYTZOWVMXAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUHA2DOMBSGU . You are receiving this because you were mentioned.Message ID: @.***>
But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.
Just to be clear, we don't believe that is true. Once the jobs are in gearmand's queue, all jobs are processed in FIFO order. It's addTask/runTasks that submits the tasks to gearmand in LIFO order. If you submit each task to gearmand as separate jobs using PHP's doBackground
or doNormal
, I think you would see that.
If your experience is different, please provide a simple reproducible test case that submits a bunch of jobs with simple payloads like "job N" and have the workers return the job payload appended with timestamps of when they are processed by the workers.
It's complicated. I am alone on the project, totally overloaded, and my usage of Gearman is far from simple. Aside the development, i also handle all the server infrastructure (75 hosts). And summer is the only period when i can migrate all the solutions we use to their latest versions without disturbing our customers. I will try to find that time, but it is a tough period for me.
Please give --round-robin
a try on your gearmand. If that doesn't fix it, then yes, if you can extract just the gearman bits of your PHP out and paste here, we can confirm if it is the library doing LIFO with tasks as we've been talking about, or something else.
Hello,
I use Gearman to drive an ETL farm. I think it is the perfect solution, and you did a really great job, but i've got one need not covered.
We are lacking processing power, and therefore workers, so there is frequent trafic jam. Each hour, our client declares a thousand jobs, but the queue is not always terminated. Gearman seems to work as a LIFO stack. So we always have a few jobs that are delayed again and again, by newer jobs being declared. And that numbers grows from hour to hour until low trafic hour or a crash (not on Gearman side, it is rock-solid).
Is there a way to use Gearman as a FIFO stack, or to repriorize existing jobs before adding new ones ? Here is what i mean :
By FIFO stack i mean that newly added jobs will only be handled after existing ones
By repriorizing existing jobs i mean that each hour, when our client starts, it could first make all existing jobs as high priority, then add new jobs as normal priority, emulating a FIFO stack
or anything else that could solve my problem