Open j3nsch opened 2 years ago
@kaustabhbarman I need a description of this for a meeting next week. So this has the highest priority right now. What needs to be done to make background processing with Gearman available and how can new types of background processes added?
Let's consider we want to perform only one job, let's say extracting the text from a PDF using Gearman. The main components of Gearman are:
Some useful facts to know beforehand :
In order to implement the text extraction with Gearman:
I think the important thing here to keep in mind is that we have to start the workers at some point manually (usually before the job request is made from client API, but there are also options to do it after the request has been made). And I think we cannot depend on pipelining the kick-off of a worker with a user request because it's an infinite loop inside the worker so the request wouldn't return anything.
Thanks, unfortunately that does not sound too promising after all. If we have to run workers (scripts) listing to Gearman, we are not much better off than before. Also PHP is not a good language for long running processes. The workers would be like demon processes and with PHP those tend to accumulate memory usage.
So how can we make this easy and robust for out example with the background extraction? Isn't there any solution for running scripts in the background with PHP?
In the end I am not committed to Gearman. We need a solution for the background extraction, first. Second it should be a solution where we can add another background script for a different purpose without it having to be setup separately.
The second part could be done by having a generic worker that then actually pick the proper class for a job. We could then add more job classes, without having to setup new workers. The type of job would be part of the information transmitted.
I think I should also mention that Gearman client does have a function that won't wait for a worker to complete a job. It can return something instantly and send the job in a queue to be executed when the server finds an idle worker. The part where I said that a worker can be started after the request relates to it. But the fact still remains that a worker needs to be running like a daemon process to listen to job requests. I can look for some other solutions for running scripts in the background from next week, but I think that would be too late for your meeting next week.
I have got enough for the meeting. Thank you for the summary. Yes, you should look for another solution. Gearman seems like a good solution for parallelizing tasks or distributing them across systems, but that isn't our current goal.
What would the setup process for Gearman look like? Right now we have Cron scripts, that need to be setup separately. Every time a new script is added it needs to be setup. What would be necessary if we use Gearman? This issue is not about every little implementation detail, but the bigger picture, the concepts, not the code.
We have Gearman, we have an OPUS 4 application, a database and scripts that perform tasks, like extracting the text from a PDF and indexing a document. What is necessary to run a job like that with Gearman? What about adding a second job skript?