gearman / gearmand

http://gearman.org/
Other
740 stars 137 forks source link

Is there a way to implement FIFO stack / job re-prioritization? #395

Open dmeziere opened 4 months ago

dmeziere commented 4 months ago

Hello,

I use Gearman to drive an ETL farm. I think it is the perfect solution, and you did a really great job, but i've got one need not covered.

We are lacking processing power, and therefore workers, so there is frequent trafic jam. Each hour, our client declares a thousand jobs, but the queue is not always terminated. Gearman seems to work as a LIFO stack. So we always have a few jobs that are delayed again and again, by newer jobs being declared. And that numbers grows from hour to hour until low trafic hour or a crash (not on Gearman side, it is rock-solid).

Is there a way to use Gearman as a FIFO stack, or to repriorize existing jobs before adding new ones ? Here is what i mean :

esabol commented 4 months ago

Jobs are assigned to workers in the order they are given to the server (FIFO). However, the task system in libgearman as used by PHP clients is an abstraction above jobs, and it sends these "tasks" as jobs. It sends them all at one time, and it happens to send them LIFO. Refer to the discussion in issue #319.

Basically, if you change how you submit the tasks/jobs in your clients (hint: use doBackground), you should get FIFO.

Alternatively, you are welcome to contribute a PR which changes the order that tasks are added in libgearman to be FIFO.

I also think you need to add more workers until the rate of jobs you can complete exceeds the rate of jobs that you add. Try doubling or tripling the number of workers you have.

dmeziere commented 4 months ago

Thank you for this track to explore. Adding more workers, in my case, means adding more physical servers (i already pushed the number of process per machine to a confortable ratio), and the costs will explode. That said, if i can prevent a not-yet-ended import to be pushed again and again by incoming ones, it will be a major upgrade !

When i say "35 PHP workers", i was meaning 7 physical servers each hosting 5 VM using each 3 worker processes.

SpamapS commented 4 months ago

You may also be running into the round robin problem. If you have multiple functions per worker, the default behavior is to assign all the jobs in one function before sending another function. Try passing --round-robin to gearmand.

On Tue, Jul 9, 2024, 7:29 AM David Mézière @.***> wrote:

Thank you for this track to explore. Adding more workers, in my case, means adding more physical servers (i already pushed the number of process per machine to a confortable ratio), and the costs will explode. That said, if i can prevent a not-yet-ended import to be pushed again and again by incoming ones, it will be a major upgrade !

— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2217889532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YGDW7447V2627GGBUTZLPXVLAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXHA4DSNJTGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dmeziere commented 4 months ago

Does doBackground() have other behavioural differences with addTask() / runTasks() ? I mean the jobs are executed, but gearadmin can't see them, and it looks like the callbacks are not executed. I use them a lot to generate a Gantt diagram, showing all the jobs in realtime. There nothing works at the monitoring level.

[edit] I now can see the job with gearadmin. The documentation (that is a bit light to my taste) states that all the callbacks handling only works with runTasks(). I really need this behaviour, it is a problem to me.

esabol commented 4 months ago

Just to be clear, the PHP extension is a separate project, and we are not responsible for it (except that it uses libgearman.so under the hood and we are responsible for that). If doBackground does not fit your needs, you are welcome to submit a PR which changes the behavior of libgearman, as mentioned previously.

SpamapS commented 4 months ago

Can you provide some partial sample client code? If you're already using GearmanClient::do, and not addTask/runTasks, then this is not the LIFO task problem, and I'm very suspicious it's the round-robin problem.

On Wed, Jul 10, 2024 at 8:48 AM Ed Sabol @.***> wrote:

Just to be clear, the PHP extension is a separate project, and we are not responsible for it (except that it uses libgearman.so under the hood and we are responsible for that). If doBackground does not fit your needs, you are welcome to submit a PR which changes the behavior of libgearman, as mentioned previously.

— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2220885395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YHJ4UT4R426V22ZGULZLVJWJAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQHA4DKMZZGU . You are receiving this because you commented.Message ID: @.***>

dmeziere commented 4 months ago

@esabol I am not blaming anyone or anything. I love Gearman ! I am just trying to understand and locate where my problem is, and to find the cheapest solution to it. Believe me, if i could provide any quality code in C / Boost, i would be proud to contribute, if it was nessessary. The only thing i said is that the Gearman documentation on the PHP website (that i understand is not gearmand related) could be enhanced.

dmeziere commented 4 months ago

@SpamapS I am not using GearmanClient::do. I experienced it this week thanks to your help on this issue, but i did not go very far because i use a lot the callbacks and communication provided by GeamanClient tasks to manage my jobs. I achieved running my jobs with GearmanClient::do, but without any feedback of course. I have a second method, also, but it is nominative (one method per "workshop" (a group of workers handled by a PHP master process in my project) used to warmup an import, before running the real jobs, that have the same function name for all the farm.

SpamapS commented 4 months ago

The callback stuff isn't hard to do with GermanClient::do .. doBackground was a problem because you never get results from it.

On Fri, Jul 12, 2024 at 3:02 AM David Mézière @.***> wrote:

@SpamapS https://github.com/SpamapS I am not using GearmanClient::do. I experienced it this week thanks to your help on this issue, but i did not go very far because i use a lot the callbacks and communication provided by GeamanClient tasks to manage my jobs. I achieved running my jobs with GearmanClient::do, but without any feedback of course. I have a second method, also, but it is nominative (one method per "workshop" (a group of workers handled by a PHP master process in my project) used to warmup an import, before running the real jobs, that have the same function name for all the farm.

— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2225242569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YFXIG7SDNAG4K5XYVLZL6STZAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRVGI2DENJWHE . You are receiving this because you were mentioned.Message ID: @.***>

esabol commented 4 months ago

I'm kind of wondering if this is actually a problem with the PHP extension after all. The implementation for the addTask method in the PHP extension has a comment that says "prepend task to list of tasks on client obj", which would seem to imply that it's the one that's setting the order to LIFO instead of FIFO.

gearman_client_add_task_handler (https://www.php.net/manual/en/gearmanclient.addtask.php): https://github.com/php/pecl-networking-gearman/blob/a52052cdd712a95091ce926be3bcdca41c730696/php_gearman_client.c#L736

SpamapS commented 3 months ago

No, that's a bit of a ruse, that's just how it's managing its own data structures. It happens here:

https://github.com/gearman/gearmand/blob/master/libgearman/packet.cc#L190-L199

Tasks are stored in the universal here until run_tasks is run. For whatever reason, they decided to prepend rather than append. As we've said before, if you want to use tasks FIFO, then you have to add them in reverse order.

The docs don't define this order, but I don't think we could change it without most likely breaking some folks.

We could probably add a new universal option to reverse the order, and if nothing else, maybe we should document that they are LIFO.

esabol commented 3 months ago

I think undocumented behavior is subject to change, personally, and I really doubt anyone wants LIFO. Just my two cents.

dmeziere commented 3 months ago

If I may add weight to the FIFO behaviour, the problem is not when one adds a bunch on jobs in an empty queue. He can, like previously said, reverse the order of submition if desired. But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.

SpamapS commented 3 months ago

That's most likely because the task system is not meant to be a long lived queue itself. It was always meant to be used to farm out the work from a single request across multiple workers and then wait for all of that work. In its original intended use case you should be flushing this accidentally LIFO queue with run_tasks as soon as you can and then waiting for all of them before sending more. If things are backing up in it, one, they're not safe, that's in-memory, but that's also just not the intended purpose of tasks.

On Mon, Jul 29, 2024, 7:25 AM David Mézière @.***> wrote:

If I may add weight to the FIFO behaviour, the problem is not when one adds a bunch on jobs in an empty queue. He can, like previously said, reverse the order of submition if desired. But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.

— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2256090569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YFS5RVCWPTNZFVSMELZOZGGDAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJWGA4TANJWHE . You are receiving this because you were mentioned.Message ID: @.***>

SpamapS commented 3 months ago

I do, BTW, agree that undocumented behavior is fair game. I am just worried about how long it's been de-facto behavior. It may actually be beneficial in some cases.

On Sun, Jul 28, 2024 at 7:56 PM Ed Sabol @.***> wrote:

I think undocumented behavior is subject to change, personally, and I really doubt anyone wants LIFO. Just my two cents.

— Reply to this email directly, view it on GitHub https://github.com/gearman/gearmand/issues/395#issuecomment-2254847025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADS6YBSBIDFK7RR7UV5ZYTZOWVMXAVCNFSM6AAAAABKSOLRSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUHA2DOMBSGU . You are receiving this because you were mentioned.Message ID: @.***>

esabol commented 3 months ago

But when one adds a bunch of jobs on an already filled queue, the oldest jobs will be pushed back by the new ones. And if the same thing appends many times, the oldest jobs will never be handled. Please excuse me if i'm not clear, my english may be deficient.

Just to be clear, we don't believe that is true. Once the jobs are in gearmand's queue, all jobs are processed in FIFO order. It's addTask/runTasks that submits the tasks to gearmand in LIFO order. If you submit each task to gearmand as separate jobs using PHP's doBackground or doNormal, I think you would see that.

If your experience is different, please provide a simple reproducible test case that submits a bunch of jobs with simple payloads like "job N" and have the workers return the job payload appended with timestamps of when they are processed by the workers.

dmeziere commented 3 months ago

It's complicated. I am alone on the project, totally overloaded, and my usage of Gearman is far from simple. Aside the development, i also handle all the server infrastructure (75 hosts). And summer is the only period when i can migrate all the solutions we use to their latest versions without disturbing our customers. I will try to find that time, but it is a tough period for me.

SpamapS commented 3 months ago

Please give --round-robin a try on your gearmand. If that doesn't fix it, then yes, if you can extract just the gearman bits of your PHP out and paste here, we can confirm if it is the library doing LIFO with tasks as we've been talking about, or something else.