Provide configuration for links that extend the indexable triples

ajs6f commented 7 years ago

Provide configuration that allows integrators to specify one or more link URIs. When resources are indexed (to any destination), if a Link header is provided for their representations with such an URI as its type, the resource on the other end of that link will be retrieved on the assumption that the response will contain RDF. The triples in the response for that request will be added to the context for the indexing transform for the original resource, for each such link.

@acoburn @birkland

ajs6f commented 7 years ago

Would it be reasonable to start this issue by factoring between fcrepo-indexing-solr and fcrepo-indexing-triplestore? The general pattern is:

Retrieve context
Execute indexing transform from context to payload
Send payload to index

The first of those should be the same among those two indexing recipes and any future recipes. The last is simply "use the configured URI that is the index's endpoint at which to accept new/replacement records". The middle one is the one that varies across recipes.

acoburn commented 7 years ago

@ajs6f: there is also a difference in step 1 "Retrieve context" -- fcrepo-indexing-solr retrieves the content via an LDPath service while fcrepo-indexing-triplestore retrieves the content directly from Fedora. This is not to say that making that configurable is particularly hard.

birkland commented 7 years ago

The imagery that initially pops into my mind as far as far as following link headers as a part of (2) involves creating a Processor impl that looks for the presence of a particular header, and replaces the message body with the resource at the other end of the link. For example, for service doc indexing:

from(direct:_wherever_message_came_from_)
  .filter(has_link_header) // Discard messages that are irrelevant, e.g. don't have desired link header
  .process(GET_LINK) // Follow the desired link header, set exchange message body to its contents
  .process(any_other_transform) // Any transform for indexing purposes
  .to("http://my_index_endpoint") // Send

One artifact of this work, then, could be that GET_LINK processor?

ajs6f commented 7 years ago

Do we not want to augment the message body with the resource on the other end of that link header?

ajs6f commented 7 years ago

I was thinking of the LDPath service as being part of the transform, but is that not the case? Is there a caching question tangled up here?

acoburn commented 7 years ago

@birkland I really like the pattern you are suggesting.

Let me suggest something that pursues this line of thinking even further. Basically, when thinking about how to update fcrepo-camel to bring it in line with the eventual Fedora specification, I would like to re-evaluate the scope of fcrepo-camel. In particular, much of what fcrepo-camel currently does is what camel-http4 already does. The value proposition of fcrepo-camel really consists in its ability to process link headers and act on other common LDP-related headers (Prefer, Accept, etc).

The line of thought I currently have is to actually remove much of the code of fcrepo-camel and replace it with a collection of Processors that parse and/or generate HTTP headers or RDF content. That way, implementations can use camel-http4 for all of the HTTP-transport and fcrepo-camel (or, even better, a new camel-ldp project) for all of the transformation/processing.

For my part, I can say that the Camel-based services we're writing at Amherst are using less and less of fcrepo-camel and more of camel-http4, and I actually see that as a very good development.

ajs6f commented 7 years ago

@acoburn That would also help make the repo-client-end of things less concerned with Fedora specifically and more concerned with patterns of generally-interesting interaction with "rich LDP".

birkland commented 7 years ago

@ajs6f The way the service doc indexer in API-X works, it wants to index only the service doc. You're right, though - augmentation is a more generally useful notion. Being able to do that in a clear and idiomatic way with fcrepo-camel providing one or more Processor and/or AggregationStrategy impls would be nice.

ajs6f commented 7 years ago

@birkland Ok, gotcha. Maybe the meta-pattern here is "cookbook". When I look at one of our two dog-eared and well-stained copies of Joy of Cooking I see a section of just recipes for stocks.

We already have a book of recipes that is -camel-toolbox and within it, we are going to write a chapter of recipes for "getting your context together". This might include such old classics as "follow a link header" or "enhance from a cache" or "replace URIs with their representations from a source of authority like VIAF or the like". The full recipes are complete, deployable components like fcrepo-indexing-triplestore. The "stocks" recipes are smaller and don't make sense to deploy by themselves, so we use abstractions like Processor.

ajs6f commented 7 years ago

https://camel.apache.org/content-enricher.html

ajs6f commented 7 years ago

If you are able to flow N-Triples, the AggregationStrategy is trivial! :smiley:

birkland commented 7 years ago

Yeah, with enricher something like

.enrich("direct:follow_link_header", MERGE_RDF_GRAPAHS)

.. where MERGE_RDF_GRAPHS is an instance of an AggregationStrategy

fcrepo-exts / fcrepo-camel-toolbox

Provide configuration for links that extend the indexable triples #138