fcrepo-exts / fcrepo-camel-toolbox

A collection of ready-to-use messaging applications with fcrepo-camel
Apache License 2.0
13 stars 26 forks source link

Updated Solr and Triplestore routes to reduce redundant fcrepo requests. #134

Closed mohideen closed 7 years ago

acoburn commented 7 years ago

@mohideen thanks!

Broadly speaking, there are two directions to go with this type of feature: add filters to the indexing features themselves (as you have done here), or separate those filters into entirely separate modules.

Personally, I like the latter approach for the following reasons:

  1. Everyone has different ideas about how messages should be filtered and/or aggregated, and any single component will never cover all use cases. E.g. maybe I want to include only skos:Concept resources; or maybe I want to exclude any dcmitype:Collection.

  2. For these types of specialized filters to work with a single module (such as fcrepo-indexing-triplestore, etc) you need to start adding lots and lots of configuration switches, adding to the complexity of the module itself.

The other approach is to, basically, add a custom shim in the messaging workflow. For instance, if fcrepo-indexing-solr is reading from the queue at queue:fedora, then you add a new component (by which I mean a few lines of Blueprint XML) that listens to that queue, and then places those messages that are not filtered on a new queue at, say: queue:filtered. Then, you update the configuration of fcrepo-indexing-solr to read from queue:filtered. This method supports a very high level of possible use cases and avoids putting too much complexity in the fcrepo-indexing-solr module itself.

To tell you the truth, while there is already some code in the solr and triplestore routes that filter messages, I really wish that code wasn't there (though removing it would mean a breaking change for users) -- it adds complexity and establishes a pattern that (IMO) is best handled by creating a separate handler who's only job is to filter messages.

Thoughts?

mohideen commented 7 years ago

@acoburn That sound good! Do you have any pointers on how that could be done?

acoburn commented 7 years ago

@mohideen I am looking more closely at what you're doing in this PR, and I don't think a simple shim is going to solve the problem you solve with this PR. Which is to say, I think the best approach will be to move forward with this PR. In the long term, I'd like to remove all of the indexing:Indexable and indexing:hasIndexingTransformation stuff in favor of a simpler workflow, but for now, I think the best way forward will be to follow the path you are suggesting with this PR. I'll give it a more thorough review next week. Thanks again!

acoburn commented 7 years ago

Thanks @mohideen!

mohideen commented 7 years ago

Thanks @acoburn! As you suggested, we are getting started with our own routes and filters for our custom needs. https://github.com/umd-lib/umd-fcrepo-camel-extensions