Open amoeba opened 2 years ago
This is going well and I'm nearly done. The query we run against every valueURI
in every semantic annotation is now:
SELECT ?annotation_value_uri
WHERE
{
{
<$CONCEPT_URI> rdfs:subClassOf* ?annotation_value_uri .
}
UNION
{
<$CONCEPT_URI> rdf:type/rdfs:subClassOf* ?annotation_value_uri .
}
UNION
{
<$CONCEPT_URI> owl:sameAs/rdfs:subClassOf* ?annotation_value_uri .
}
UNION
{
<$CONCEPT_URI> owl:equivalentClass/rdfs:subClassOf* ?annotation_value_uri .
}
}
This means we get some cool stuff we didn't have before. For example, if a person searches for datasets annotated with the NamedIndividual from ARCRC for "Snow Depth" (a key variable, shown in Orange), they get all of this back and searches for any of these terms return the dataset with this annotation:
Before, we would've just returned the the dataset if we searched directly for the NamedIndividual because we aren't expanding them at all.
The last thing I want to do is get n-degree sameAs
working. For example, if A sameAs B, sameAs C, we want searches for any of A, B, or C to return A, B, and C.
This is excellent, @amoeba. Thanks!
Did you and @mpsaloha discuss skos:exactMatch
and skos:closeMatch
as candidates for this type of alignment as well? We can always add them later, but if you're making changes and reindexing things, then it might make sense. Details from SKOS:
The property skos:closeMatch is used to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications. In order to avoid the possibility of "compound errors" when combining mappings across more than two concept schemes, skos:closeMatch is not declared to be a transitive property.
The property skos:exactMatch is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications. skos:exactMatch is a transitive property, and is a sub-property of skos:closeMatch.
Thanks @mbjones, we hadn't. I'll run it by him but.
That you mention it, it seems like we oughta look at adding it in now. I don't think it'll have any practical impact at the moment because I don't think the ontologies we do query expansion on use those terms but it's probably better to put the change in now to save ourselves some time.
Added support for skos:exactMatch
and skos:closeMatch
in https://github.com/DataONEorg/d1_cn_index_processor/commit/24abd6cb775e509b9ec00964460083d59ad36828.
For skos:exactMatch
, we treat it as transitive and symmetric. For skos:closeMatch
, we don't treat it as transitive but do treat it as symmetric. This was based on suggestion from @mpsaloha and the logic makes sense and matches the quoted information from @mbjones above.
LGTM.
@mbjones asked on Slack whether we index URIs for terms that are sameAs'd. I answered no but maybe we actually do and I just haven't seen it happen because we don't use sameAs much in our ontologies.
In our use of semantic search so far, we haven't made much use of alignment axioms (sameAs, equivalentClass, equivalentProperty) and have so far only used subClassOf (to materialize parent classes). I think the MOSAiC ontology is the first ontology we've enabled in search that uses things like NamedIndividuals and sameAs in a way we care about and I think our search isn't quite as good for MOSAiC as it is for others (like ECSO).
I think we could make a few changes to improve that and improve things for the future.
Change one: Include the class of any value URIs
For example, for the MOSAiC datasets we have in the Arctic Data Center, we inserted dataset-level annotations like
Each of those annotations is to a NamedIndividual, rather than to a Class as we've been doing. How do we drive good searches for these? The only way right now is to search exactly for the term. But PS122/2, for example, is of class "_MOSAiC Specific Term" and "Campaign". I think it'd be nice if a person could search for PS122/2 directly but also by either of its classes. This same reasoning applies to the other two annotations in the screenshot above.
Change two: Include sameAs and equivalentClass/equivalentProperty relationships for terms
As another example, MOSAiC has a set of Research Location named individuals, like "Arctic Ocean". We didn't align this term but if we did in the future (say to http://purl.obolibrary.org/obo/GAZ_00000323), it'd be best if searches for either MOSAiC's term and GAZ's term returned documents annotated either way. This would benefit any future alignment work we do and improve searches.
Summary
Both changes, in addition to our current indexing rules, would be additive. That is, any search that works now also works with these changes. The current logic is:
With the changes above, we'd get:
equivalentProperty
sowl:sameAs
equivalentClass
sI think this can be done by adjusting the SPARQL query we already use and I don't expect we'll have any performance issues or Solr index growth issues.