Closed ajs6f closed 3 years ago
Would it be reasonable to start this issue by factoring between fcrepo-indexing-solr
and fcrepo-indexing-triplestore
? The general pattern is:
The first of those should be the same among those two indexing recipes and any future recipes. The last is simply "use the configured URI that is the index's endpoint at which to accept new/replacement records". The middle one is the one that varies across recipes.
@ajs6f: there is also a difference in step 1 "Retrieve context" -- fcrepo-indexing-solr
retrieves the content via an LDPath service while fcrepo-indexing-triplestore
retrieves the content directly from Fedora. This is not to say that making that configurable is particularly hard.
The imagery that initially pops into my mind as far as far as following link headers as a part of (2) involves creating a Processor
impl that looks for the presence of a particular header, and replaces the message body with the resource at the other end of the link. For example, for service doc indexing:
from(direct:_wherever_message_came_from_)
.filter(has_link_header) // Discard messages that are irrelevant, e.g. don't have desired link header
.process(GET_LINK) // Follow the desired link header, set exchange message body to its contents
.process(any_other_transform) // Any transform for indexing purposes
.to("http://my_index_endpoint") // Send
One artifact of this work, then, could be that GET_LINK
processor?
Do we not want to augment the message body with the resource on the other end of that link header?
I was thinking of the LDPath service as being part of the transform, but is that not the case? Is there a caching question tangled up here?
@birkland I really like the pattern you are suggesting.
Let me suggest something that pursues this line of thinking even further. Basically, when thinking about how to update fcrepo-camel
to bring it in line with the eventual Fedora specification, I would like to re-evaluate the scope of fcrepo-camel
. In particular, much of what fcrepo-camel
currently does is what camel-http4
already does. The value proposition of fcrepo-camel
really consists in its ability to process link headers and act on other common LDP-related headers (Prefer, Accept, etc).
The line of thought I currently have is to actually remove much of the code of fcrepo-camel
and replace it with a collection of Processors that parse and/or generate HTTP headers or RDF content. That way, implementations can use camel-http4
for all of the HTTP-transport and fcrepo-camel
(or, even better, a new camel-ldp
project) for all of the transformation/processing.
For my part, I can say that the Camel-based services we're writing at Amherst are using less and less of fcrepo-camel
and more of camel-http4
, and I actually see that as a very good development.
@acoburn That would also help make the repo-client-end of things less concerned with Fedora specifically and more concerned with patterns of generally-interesting interaction with "rich LDP".
@ajs6f The way the service doc indexer in API-X works, it wants to index only the service doc. You're right, though - augmentation is a more generally useful notion. Being able to do that in a clear and idiomatic way with fcrepo-camel
providing one or more Processor
and/or AggregationStrategy
impls would be nice.
@birkland Ok, gotcha. Maybe the meta-pattern here is "cookbook". When I look at one of our two dog-eared and well-stained copies of Joy of Cooking I see a section of just recipes for stocks.
We already have a book of recipes that is -camel-toolbox
and within it, we are going to write a chapter of recipes for "getting your context together". This might include such old classics as "follow a link header" or "enhance from a cache" or "replace URIs with their representations from a source of authority like VIAF or the like". The full recipes are complete, deployable components like fcrepo-indexing-triplestore
. The "stocks" recipes are smaller and don't make sense to deploy by themselves, so we use abstractions like Processor
.
If you are able to flow N-Triples, the AggregationStrategy
is trivial! :smiley:
Yeah, with enricher something like
.enrich("direct:follow_link_header", MERGE_RDF_GRAPAHS)
.. where MERGE_RDF_GRAPHS
is an instance of an AggregationStrategy
Provide configuration that allows integrators to specify one or more link URIs. When resources are indexed (to any destination), if a
Link
header is provided for their representations with such an URI as its type, the resource on the other end of that link will be retrieved on the assumption that the response will contain RDF. The triples in the response for that request will be added to the context for the indexing transform for the original resource, for each such link.@acoburn @birkland