bf_InstanceOf vs bf_hasInstance and bf_itemOf vs bf_hasItem

mstabile75 commented 6 years ago

If there is no hard reason not to. It is better for indexing if to use the has version of properties since in an ideal index scenario work flows to instance down to item in one Elasticsearch document.

Jeremy can you provide feedback on the challenges of switching these owl:inverseOf usages

jermnelson commented 6 years ago

The biggest challenges of reversing the relationships between BF Works, Instance, and Items for Elasticsearch has more to do with the incoming data from LOC marc2bibframe that links the most granular entity class with it corresponding more abstract entity (i.e. BF Item itemOf BF Instance) and changing the current RML mappings for LOC BF to Lean BF. Also, I wonder in a multiple institution situation (i.e. Alliance BIBCAT Goldrush project) where some BF Works may have hundreds of Instances and corresponding Items stored in a single document may have some performance implications verses smaller ES documents if we used a different indexing strategy. This may not matter in the short run but we should do some stress testing on sample ES docs with hundreds of Instances and Items.

mstabile75 commented 6 years ago

fixed in commit 5323221 es_json can now handle owl:inverseOf relationship. handling must be done in the the definitions file. Like this:

bf:hasInstance kds:rangeDef [ kds:appliesToClass kdr:AllClasses ; kds:esLookup owl:inverseOf ] .

bf:instanceOf kds:rangeDef [ kds:appliesToClass kdr:AllClasses ; kds:esIndexType es:Ignored ] .

kds:rangeDef defines how the subject property should handle its object values
kds:appliesToClass specifies which class to apply the defintion
kds:esLookup specifies the property class attribute to use. In this case owl:inverseOf is defined in the core bibframe vocabulary as bf:instanceOf. The class then does a search of the dataset returning all the objects where the bound_class' subject is the object of bf:instanceOf
es:Ignored will tell just return the uri of the values of bf:instanceOf

The es:Ignored and kds:esLookup are used in conjunction to avoid recursive nesting of the inverse properties

jermnelson commented 6 years ago

Where would these triples reside? I'm worried we're diverging too much from the RML spec by introducing our own vocabulary triples to the RML map.

mstabile75 commented 6 years ago

These currently reside in the bibcat/rdfw-definitions/bc_core_links.ttl file. They are not connected to RML in away. I can't use RML for the elasticsearch conversion since the conversion process needs to be tightly woven with the core rdf vocabularies. The elasticsearch conversion makes assumptions based on the the core vocabularies. When those assumptions fail, like, in this case, an override option can be added to the active_defs triplestore. Envision the elasticsearch index as a 'as close as possible' representation of the data in the triplestore and interaction between the two should be transparent outside of the core system. The RML processor conversions should be a translation between external and core (i.e. knowledge links bibcat) datasets.

Where the RML and elasticseach will intersect is for caching. example:

I convert an instance to elasticsearch object. all elasticsearch querying will be done against the fields indexed by this conversion.
I also want to cache one of the RML mapping conversions.
a non-indexed field will be added to the elasticsearch document with a straight text dump of the conversion.
when that specific conversion is needed just pull the elasticsearch field for that mapping.

Caching process:

query the triplestore for an item and associated data
data loaded into an RdfDataset
Embedded in the dataset is the mapping and conversion to a rdfframework elasticsearch document
Run any RML processors against the dataset for caching purposes and add those text dumps to the elasticsearch document. With the json_qry options in the RML we should not need to requery the triplestore at this point.
post the document to elasticsearch

KnowledgeLinks / rdfframework

bf_InstanceOf vs bf_hasInstance and bf_itemOf vs bf_hasItem #15