esmero / strawberryfield

A Field of strawberries
GNU Lesser General Public License v3.0
10 stars 5 forks source link

Hydroponics process logic, timing, multiprocess and a lot more #133

Open giancarlobi opened 3 years ago

giancarlobi commented 3 years ago

@DiegoPino Following the variables and conditions I analyzed to manage hydroponics process. I'll try to edit code (and made a PR if I'll reach some good point) based on:

Variables:

  1. intval(\Drupal::state()->get('hydroponics.heartbeat')); every 3s heartbeat timer updates stored value delta = ($currentTime - $lastRunTime) Values: delta < 3s : process alive delta > 3s : process stopped or hang

  2. (int) \Drupal::state()->get('hydroponics.queurunner_last_pid', 0); PID of hydroponics process. Values: 0 : never run N>0 : run or running process pid (not imply process is running) N<0 : idle timer executed (not imply process stopped, i.e. software hangs) was running with pid=-N

  3. posix_kill($queuerunner_pid, 0); Values: TRUE : process with $queuerunner_pid pid is running FALSE : no process with $queuerunner_pid pid is running

Timers:

Conditions:

  [1]      [2]         [3]         STATUS      Notes
 delta     PID    posix_kill(0)

  <3s       0         TRUE           N/A       If PID=0 we cannot call posix_kill

  <3s       0         FALSE          N/A       If PID=0 we cannot call posix_kill

  <3s       >0        TRUE         running     The normal condition when all is ok

  <3s       >0        FALSE         error      posix_kill failure? to check again?

  <3s       <0        TRUE          error      idle timer reached but heartbeat and process running (stop?)

  <3s       <0        FALSE         error      process alive and no entry in process table: really strange

  >3s       0         TRUE           N/A       If PID=0 we cannot call posix_kill

  >3s       0         FALSE      N/A or boot   If PID=0 we cannot call posix_kill

  >3s       >0        TRUE          error      Heartbeat died while in process table: hang (stop?)

  >3s       >0        FALSE         error      process stopped not by idle timer (i.e. killed from command line)

  >3s       <0        TRUE          error      idle timer reached but process running (stop?)

  >3s       <0        FALSE        stopped     The normal condition when all is ok
DiegoPino commented 3 years ago

Hi, I have no internet but a few things I can see from my phone: delta > 3s : process stopped or hang This is not accurate. There won't be any pings while an HOCR is processing. If the process is an image it can take at least 60 seconds. Loops try to run on the intervals we give them but will depend (when not doing a fork/child process) on how much each process takes on returning.

So you will have to be more conservative about the time to have a reliable chart of we will end killing things we need. yes, we should remove the 720 seconds and maybe give each queue more than 60 seconds (per queue worker) to deal with things.

giancarlobi commented 3 years ago

@DiegoPino I agree and I have to work on that. In the meantime I made some changes and added more logic to run, checkrunning and form of hydroponics process. All in a branch on my git here: https://github.com/giancarlobi/strawberryfield/tree/ISSUE-133. This is working and can reset from a condition when you kill process from command line. A lot of more checks and debug I need, probably tomorrow. No rush, so take a look when free and .. connected.

giancarlobi commented 3 years ago

@DiegoPino I think I have to analyze the way to use react PHP child process as you already point to me. In my first test about react php I already tested it here: https://github.com/giancarlobi/strawberry_runners/blob/1ce8855d792b030535fae0cca5158298ebd94817/src/Scripts/mainLoop.php#L213. Probably this will be for a next release of archipelago, not sure I'll be able to solve before RC1.

giancarlobi commented 3 years ago

So you will have to be more conservative about the time to have a reliable chart

@DiegoPino I changed code to be more conservative. I assume heartbeat alive if less or equal to 60s + 5s . Also updated error messages and stop/reset to be more conservative. Also idle timer changed from 60s to 80s. The full code on the same branch as above: https://github.com/giancarlobi/strawberryfield/tree/ISSUE-133 Feel free to check when you have sometime, no rush.

giancarlobi commented 3 years ago

@DiegoPino this is related to this PR https://github.com/esmero/strawberryfield/pull/138

giancarlobi commented 3 years ago

@DiegoPino about this: "So this one archipelago:hydroponics and one named archipelago:hydroponics-multiprocess..." is more simple a single command with default 1 process x queue and if we want multi process we have to add a parameter with number of processes ?

DiegoPino commented 3 years ago

@giancarlobi the reason I want to allow a few options there is because I see in the future a need for different ways of processing and having alternatives would allow other modules to override it. A single Big service is good but a small one, simple and a bigger one feels more sustainable since we will have for sure many small institutions using this that may want to have less moving pieces (or even invisible ones).

DiegoPino commented 3 years ago

1 process x queue == is good for your new service. The old one is one process for all the queue, and to be honest is also good for small repositories because its very simple to debug and manage

DiegoPino commented 3 years ago

1 process x queue == is good for your new service. The old one is one process for all the queue, and to be honest is also good for small repositories because its very simple to debug and manage

giancarlobi commented 3 years ago

@DiegoPino I understand. So the idea could be: the current archipelago:hydroponics is fixed to 1 process x queue while we then create a new one archipelago:hydroponics-multiprocess where we pass number of proc x queue by parameter or something like this. Right? When you wrote "is good for your new service" do you refer to current hydroponics implemented in my PR ?

giancarlobi commented 3 years ago

Or do you prefer that archipelago:hydroponics will be 1 process for all queue? sorry, it's not clear to me.

giancarlobi commented 3 years ago

@DiegoPino to make (mainly for me) ideas more clear along all chain process called, in this PR the chain is: 1) main loop DRUSH archipelago:hydroponics, started from here https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/StrawberryfieldHydroponicsService.php#L211 and executes this code https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/Commands/HydroponicsDrushCommands.php#L39 2) For every queue and for each element (one at time) call a child process (in background) by DRUSH archipelago:hydroqueue here https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/Commands/HydroponicsDrushCommands.php#L129 and executes this https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/Commands/HydroponicsQueueProcessDrushCommands.php#L33 3) The above drush command calls this function https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/StrawberryfieldHydroponicsService.php#L107 4) The function above calls the correct queue worker here https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/StrawberryfieldHydroponicsService.php#L124 5) Finally the queue worker processes data and return

In case of multiprocessing, multiple (2) DRUSH archipelago:hydroqueue are called.

DiegoPino commented 3 years ago

@giancarlobi I think the current (RC1) hydroponics drush is OK for the small institutions (with small corrections). No Childs and simply devoting partial time to each queue. For the new one (yours) your code is perfect, but I want to check/ask you something first...

  1. For every queue and for each element (one at time) call a child process (in background) by DRUSH archipelago:hydroqueue here

Is this correct? Each queue can run during its Process more than a single Item. Can we leave the queue to decide how many it runs and only make the Process be one per queue? For what I see you are already making it like that: https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/Commands/HydroponicsQueueProcessDrushCommands.php#L43-L45. You are letting the queue decide how much it can do in 60 seconds right? So its about time, not number of elements and in this case you need to change the message because we know how many items are left in the return.

https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/StrawberryfieldHydroponicsService.php#L177 And we can ask how many are before too.

  1. 4 and 5.

Perfect. But I still think the process needs to be clearly stated as 1 per queue (the code does that) for 60 seconds each time. And that number (60) can be a setting

giancarlobi commented 3 years ago

You are letting the queue decide how much it can do in 60 seconds right? So its about time, not number of elements and in this case you need to change the message because we know how many items are left in the return.

@DiegoPino Not sure about the question, but I think I process only one item at time because I'm calling processQueue($queue, 60, TRUE); where the last parameters (I added to function) is telling single element so in the function we go here https://github.com/esmero/strawberryfield/blob/8ac5cca8f28dce19722f2742bc91ab06455d5b04/src/StrawberryfieldHydroponicsService.php#L146 where we claim only one element and time is use as lease time, if I correctly understood, it means if we don't delete that item from the queue before lease time the item is requeued. Sorry if I'm not clear or I didn't understand your question ... here near sleeping time so ... a really few neurons alive .. take care friend

giancarlobi commented 3 years ago

@DiegoPino What about a setting in hydroponics conf page where user is able to select the type of processing:

DiegoPino commented 3 years ago

@giancarlobi yes! That is what I would like. For security reasons the last one needs to be set in the settings.php or provided by a module. If not people (bad admins) could make a mess. About multi processor. It conflicts a little bit the fact that is processing a single item per process. Do you think we could better deal with this as time based? Why? Because starting and closing a drush script as a child takes time and resources (a full bootstrap).

So let me now test your work. I have only 3 live archipelagos left to update (did 4 last night).

giancarlobi commented 3 years ago

@DiegoPino to help my memory, as discussed in the call, next steps will be:

After this, we can check deeper exceptions and timeout and more.

Feel free friend to edit anything here, please.

DiegoPino commented 3 years ago

Perfect friend! Exactly that.

giancarlobi commented 3 years ago

@DiegoPino some notes about pcntl. Reading this https://stackoverflow.com/questions/17050799/how-to-check-pcntl-module-exists is what happens with my server. From code extension_loaded('pcntl') return false but from CLI function_exists("pcntl_fork") return true. In addition I saw this https://github.com/reactphp/event-loop/pull/195 . Also, on my archipelago react-php child processes seems to work so ... not clear to me what we need or not and how we check

giancarlobi commented 3 years ago

@DiegoPino About your post "To think about: some queues need to run in sequence. Some queues may allow parallel processing." I was thinking about this. The starting point is that in the queue the processes are inserted in order of execution/dependency. What if we assign to each process (child?) a level starting from 0 then 1, 2, ...

So the idea is:

What do you think? Is this something you thinking about?

DiegoPino commented 3 years ago

@giancarlobi I like the idea, but I feel its too complex. The service is already quite complex. I feel its easier we simply mark "queues" completely as "able to run in parallel" or sequential. We already do a lot of logic on Queue assignment and calculating which element needs which one. Actually sequential workers are the ones in charge of creating their own children. Bringing all that logic into the process manager feels like doing it twice and adding too much complexity

What I would suggest is to add another queue/queue worker, one for "sequential heavy processing".

Basically leaving ONLY the one that keeps OCR right now as "parallel-izeable" all the rest needs to run in order one item at the time.

How? I will think about it. Probably an annotation of a public method like ->allowsParallel() and return false for each one except the one we have for the OCR

giancarlobi commented 3 years ago

@giancarlobi I like the idea, but I feel its too complex. The service is already quite complex. I feel its easier we simply mark "queues" completely as "able to run in parallel" or sequential. We already do a lot of logic on Queue assignment and calculating which element needs which one. Actually sequential workers are the ones in charge of creating their own children. Bringing all that logic into the process manager feels like doing it twice and adding too much complexity

What I would suggest is to add another queue/queue worker, one for "sequential heavy processing".

Basically leaving ONLY the one that keeps OCR right now as "parallel-izeable" all the rest needs to run in order one item at the time.

How? I will think about it. Probably an annotation of a public method like ->allowsParallel() and return false for each one except the one we have for the OCR

@DiegoPino I agree to avoid complex code. The only difference between a parallel-izeable queue and the other ones is that in the first case we can have more than 1 process running per queue while for second one we limit a 1 process per queue, is this right?

DiegoPino commented 3 years ago

Yes. Only difference i guess. If an item fails we put it back 2 times and give up (queue worker does that)

El El dom, 17 de ene. de 2021 a la(s) 12:04, Giancarlo < notifications@github.com> escribió:

@giancarlobi https://github.com/giancarlobi I like the idea, but I feel its too complex. The service is already quite complex. I feel its easier we simply mark "queues" completely as "able to run in parallel" or sequential. We already do a lot of logic on Queue assignment and calculating which element needs which one. Actually sequential workers are the ones in charge of creating their own children. Bringing all that logic into the process manager feels like doing it twice and adding too much complexity

What I would suggest is to add another queue/queue worker, one for "sequential heavy processing".

Basically leaving ONLY the one that keeps OCR right now as "parallel-izeable" all the rest needs to run in order one item at the time.

How? I will think about it. Probably an annotation of a public method like ->allowsParallel() and return false for each one except the one we have for the OCR

@DiegoPino https://github.com/DiegoPino I agree to avoid complex code. The only difference between a parallel-izeable queue and the other ones is that in the first case we can have more than 1 process running per queue while for second one we limit a 1 process per queue, is this right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/esmero/strawberryfield/issues/133#issuecomment-761844563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU7ZZ5T4YRBXBLDCEVTGRLS2MKANANCNFSM4VZEN7CQ .

-- Diego Pino Navarro Digital Repositories Developer Metropolitan New York Library Council (METRO)