Using Advanced Queue instead of ActiveMQ/Karaf/Alpaca

seth-shaw-unlv commented 3 years ago

During last week's Tech Call Kyle Huynh and Nat Kanthan demonstrated their new Triplestore Indexer.

Instead of relying on on a message emitted to ActiveMQ to be consumed by Karaf and Alpaca, it uses the Advanced Queue module to schedule a job and then uses an indexing action to push the JSON-LD into the triplestore. One of the big advantages of this approach is that the Advanced Queue modules provides a user-interface for tracking what jobs are pending, failed, or completed.

During the call it was suggested that this can be used as a pattern to replace other events we emit to ActiveMQ (e.g. Fedora indexing and derivatives). This would require:

Porting Alpaca to PHP actions (in place of the existing EmitEvent-based actions; we can probably copy/pasta some of the existing actions' form code to help with this)
Creating a new Context reaction that can populate the ported Alpaca actions into an Advanced Queue queue.

This way we can keep all of our Context conditions logic in place, we are simply swapping out the context reaction.

One of the limitations of this proposal is the 'turn-around time' between performing an action (creating a node, etc.) and the resulting actions (e.g. indexing). Currently, Karaf is constantly polling ActiveMQ which can result in the triggered actions to be seemingly instantaneous (unless under a significant load or intensive actions like large image derivatives); whereas Advanced Queue either needs a cron-run OR a drush command to perform it's work. That stated, there are ways to treat drush commands like a daemon to provide the same effect as we currently do with Karaf. For example, with Triplestore indexer, you can configure it to run for a certain amount of time, e.g. just under a minute, and then configure cron to run drush for that particular queue as often, i.e. every minute. There are probably better ways to daemonize drush, but this would probably work well enough.

Did I miss anything? Other thoughts or corrections?

whikloj commented 3 years ago

I will still want to use ActiveMQ and Camel (though I'm working on dumping Karaf) so I'd like whatever solution to not close off that possibility. As long as we can configure an external broker OR use advanced queue then I think that would be robust enough.

seth-shaw-unlv commented 3 years ago

We should be able to manage that, @whikloj. We would create new actions and a new Context reaction for those that want them while keeping the existing ones. The question would then be which we provide by default, which would involve getting feedback from several stakeholder; but we can cross that bridge when we get to it.

seth-shaw-unlv commented 3 years ago

Should have tagged @Natkeeran in the first post. I don't appear to have a Github handle for Kyle.

whikloj commented 3 years ago

I'm fine with defaulting to Advanced Queue and leaving ActiveMQ and Alpaca as an alternative option for those that want it.

DiegoPino commented 3 years ago

@seth-shaw-unlv just in case you have time to look at some other's projects code, we have "daemonized" drush for our background processors in Archipelago (HOCR, any binary that runs on metadata conditionals/files/input, file transmutations, etc) using queue workers and a hierarchical post processor plugin system provided by Strawberry Runners. Was in our roadmap for a long time and the approach has been working well in 1.0.0-RC1 since it went public. We even have now an open pull for Multi Child processing using reactphp written by @giancarlobi (we have been using it for a few months already). The approach is quite efficient and works perfectly. We decided not to go for advanced queue module because CORE was enough for all these needs, adding an extra dependency made just all more complex to maintain for us.

Just wanted to put this here in case you want to look at our Drush (10) approach/code and our Background Service supervisor. Good luck

seth-shaw-unlv commented 3 years ago

Thanks for the tip, @DiegoPino. I don't know if I'll be the first one to tackle this issue as my stakeholder's to-do list is already quite long.

kylehuynh205 commented 3 years ago

Should have tagged @Natkeeran in the first post. I don't appear to have a Github handle for Kyle.

Thanks Seth, mine is @kylehuynh205 (https://github.com/kylehuynh205)

kylehuynh205 commented 3 years ago

@seth-shaw-unlv just in case you have time to look at some other's projects code, we have "daemonized" drush for our background processors in Archipelago (HOCR, any binary that runs on metadata conditionals/files/input, file transmutations, etc) using queue workers and a hierarchical post processor plugin system provided by Strawberry Runners. Was in our roadmap for a long time and the approach has been working well in 1.0.0-RC1 since it went public. We even have now an open pull for Multi Child processing using reactphp written by @giancarlobi (we have been using it for a few months already). The approach is quite efficient and works perfectly. We decided not to go for advanced queue module because CORE was enough for all these needs, adding an extra dependency made just all more complex to maintain for us.

Just wanted to put this here in case you want to look at our Drush (10) approach/code and our Background Service supervisor. Good luck

Thanks for the great suggestions from @seth-shaw-unlv and @DiegoPino, we have developed a 'prototype' version module which works as a daemonized ReactPHP's Event Loop. This runner can be configure to run in an interval, check if the queues have any queued jobs, then run advanced queue(s). This will help to auto run the queues without manually running Drush command and cron job. Please find the module at: https://www.drupal.org/project/advancedqueue_runner

kylehuynh205 commented 2 years ago

A few enhancement with our approach for Blazegraph micro-service with Advanced Queue and the Runner.

We add a feature to re-run a job if it is failure, with options to choose how many time to re-run and delay between each time of re-run.

After monitoring the Advanced Queue Runner, we found that the runner can be interrupted sometimes .ie if the server is reboot. We add a feature to check and re-run it while cron runs. If cron setup to run more frequently in a Drupal site, the runner can avoid to be re-start manually.

With latest versions: https://www.drupal.org/project/triplestore_indexer/releases/8.x-1.5-beta1 https://www.drupal.org/project/advancedqueue_runner/releases/8.x-1.1-alpha2

Islandora / documentation

Using Advanced Queue instead of ActiveMQ/Karaf/Alpaca #1746