Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 71 forks source link

Investigate Views Bulk Operations for 'Re-Indexing' #682

Open dannylamb opened 7 years ago

dannylamb commented 7 years ago

Now that everything's been pushed async, we need a way to rebroadcast an event to trigger a 're-index' of things in Fedora, the triple store, or to re-generate derivatives.

Install the Views Bulk Operation module, and see if you can create a bulk operation based on a Rules action. Eventually, we'll have a view for resources that failed during processing, but for now, just mess with the collection view or create your own to test it out.

We may need to change how we're doing our Rules actions to broadcast events into the system, so if you run into that, don't fret. The scope is just to see if VBO has Rules integration and works out of the box with any arbitrary Rules action.

kimpham54 commented 7 years ago

@dannylamb These are the actions that you can perform with VBO:

screen shot 2017-07-21 at 16 17 42

Are you hoping that VBO would integrate with Rules so that you could have a list of nodes, select them, then run a rule such as Broadcast Content Update Event?

dannylamb commented 7 years ago

@kimpham54++ Yes, I was hoping to see our 'Rules Actions' on that list instead of plain old 'Actions'. It's a shame it's not to that level of integration yet :( Thanks so much for looking into this for me.

DiegoPino commented 7 years ago

@kimpham54 @dannylamb should be simple to integrate. We can follow the basic example at \core\modules\node\src\Plugin\Action\ or in our case (more advanced) based on this https://www.drupal.org/docs/8/modules/views-bulk-operations-vbo/getting-started and our existing RulesActions, it could be almost just an annotation based wrapper. Extending the ViewsBulkOperationsActionBaseclass and implementing inside ::execute and ::access methods the actual lifting functionality by wrapping via things like https://github.com/Islandora-CLAW/islandora/blob/8.x-1.x/src/Plugin/RulesAction/UpdateEventGenerator.php

Ideas?

kimpham54 commented 7 years ago

@DiegoPino I can look into this!

dannylamb commented 7 years ago

@kimpham54 @DiegoPino This was a D7 feature that hasn't been implemented yet in D8 that I honestly planned on waiting for. Wrapping a RulesAction with an Action is a good temporary measure, but I don't want you to sink a ton of time into something that will eventually be moot. So don't be afraid to peel away if it turns into a lot of work.

There's also the possibility of seeing what it would take to port this sort of integration to D8, but that's probably even more work and would require interacting with the Drupal VBO maintainers.

Either way we're pretty well out of scope for this issue.

kimpham54 commented 7 years ago

@DiegoPino @dannylamb Creating a custom action was not difficult using the template provided here: https://www.drupal.org/docs/8/modules/views-bulk-operations-vbo/advanced. I'll spend some time (not too much) to see how to execute the RulesAction

DiegoPino commented 7 years ago

@kimpham54 excellent!!

kimpham54 commented 7 years ago

@DiegoPino @dannylamb looks like you can execute a custom rule using rules_invoke_SOMETHING(), see http://www.drupalcontrib.org/api/drupal/contributions%21rules%21rules.module/function/rules_invoke_event/8. Now I'm wondering if you either invoke_event "After updating content (rules_entity_update:node)" or just pass the ID of a custom rule, such as https://github.com/Islandora-CLAW/islandora/blob/8.x-1.x/src/Plugin/RulesAction/UpdateEventGenerator.php#L13.

Not sure if I'm really on the right track given my limited knowledge of Drupal 8... and programming in general.

Anyways, before I try a few things out, I realized I'm not actually sure what to expect when that rule is triggered. @dannylamb can you tell me how to actually test that rule to confirm that it's working?

dannylamb commented 7 years ago

@kimpham54 Yeah, our custom rules actions are problematic to test because they're only meant to generate the jsonld and publish them to a queue. If rules_invoke_event triggers all the actions we have, then you can tail the logs in karaf to see if the message gets published and consumed. If you vagrant ssh into the box, you can see the logs by popping into the karaf console and issuing the log:tail command.

$ /opt/karaf/bin/client
> log:tail

If a bunch of gibberish flies by on the screen when you trigger the operation, then it's working.

If that doesn't work out, maybe you can try with an action that ships with the rules modules, like sending an email? Then at least you'll have something a bit more tangible to see if it works.

dannylamb commented 6 years ago

We've done as @DiegoPino suggested and made everything as actions that then get wrapped downstream, just with the context module instead of rules. Now we can bulk re-index through the Drupal UI! It's like having bookmark baked into core.

mjordan commented 6 years ago

Maybe regenerate derivatives via VBO?

dannylamb commented 6 years ago

Anything written as an Action can be applied VBO style, so yeah, totally! Derivatives will be done this way too.

kayakr commented 4 years ago

I've been exploring using VBO on Islandora and have run into a problem (this is on a playbook VM built a month or so ago, but pretty current otherwise). Also, using VBO 8.x-2.6 because 3.6 doesn't yet honour views filters.

I get different behaviour trying to index an object via standard Admin > Content view vs a VBO view; the first indexes into fcrepo, the VBO approach doesn't

In VBO view:

Now try via Admin > Content

So, what is the difference in these two approaches? Any hints as to where I should be looking are welcome.

This is particularly relevant because the standard Admin > Content view only shows 50 items, whereas VBO can use a batch to process 100s or 1000s of items.

seth-shaw-unlv commented 4 years ago

EmitEvent execute() fires, with AFAICT the correct entity, token, user id etc.

It is this bit that gives me pause. Can you verify by reviewing the Milliner and Gemini logs? That index event should show up in both along with the JWT used. Throw that JWT into a debugger to ensure all the appropriate parts are there. If you aren't seeing the indexing actions trigger events in there I would step back to the Karaf logs or even the ActiveMQ admin interface to make sure those messages are queuing/dequeuing appropriately.

Flying blind (I haven't spun up a test to reproduce the issue as described), my guesses would be 1) something odd is causing the current user (and thus, the associated roles) to be dropped from the event (we've seen things like that happen before; to @ajstanley with SOLR most recently to my memory ) which should show up as errors in the Milliner and Gemini logs OR 2) the message isn't getting to the queue (so Karaf doesn't even process the events at all).

kayakr commented 4 years ago

@seth-shaw-unlv Thanks for the pointers. I'll dig deeper...

dannylamb commented 3 years ago

I've successfully pulled this off in ISLE and am working through the particulars. I can offer examples/documentation shortly.

dannylamb commented 3 years ago

....so....

Looks like if we want to reindex in Fedora, we've got to ditch gemini. Delving into attempting to reindex exposes all kinds of spots where this use case just wasn't taken into consideration, and the index itself gets messy real fast. I'm pushing through that work, but for now, I can at least drop this nugget as an example of how to reindex stuff with VBO:

# Re-index RDF in Fedora
drush --root /var/www/drupal/web -l localhost:8000 vbo-exec non_fedora_files emit_file_event --configuration="queue=islandora-indexing-fcrepo-external&event=Update"
drush --root /var/www/drupal/web -l localhost:8000 vbo-exec all_taxonomy_terms emit_term_event --configuration="queue=islandora-indexing-fcrepo-content&event=Update"
drush --root /var/www/drupal/web -l localhost:8000 vbo-exec content emit_node_event --configuration="queue=islandora-indexing-fcrepo-content&event=Update"
drush --root /var/www/drupal/web -l localhost:8000 vbo-exec media emit_media_event --configuration="queue=islandora-indexing-fcrepo-media&event=Update"
elizoller commented 3 years ago

@dannylamb if we document this in the docs, can we close this? :D

dannylamb commented 2 years ago

The reindexing commands are documented for ISLE here: https://islandora.github.io/documentation/installation/docker-available-commands/#reindex-fedora-metadata

They don't go into any details, it's just 'here's the command and what happens when you run it'. If that's not sufficient we'll have to make a separate page for just that, or add it to the cookbook maybe? But this feels like a pretty core thing IMO