unresolved IDs - Githubissues

tmushayahama commented 2 years ago

That issue of unresolved IDs is coming back

tagging @balhoff @vanaukenk

kltm commented 2 years ago

@vanaukenk Has volunteered to test once this gets to a server.

kltm commented 1 year ago

apk3 is currently expanding to "http://arabidopsis.org/servlets/TairObject?type=locus&name=AT3G03900", and not a CURIE as other identifiers--so successfully switched, but same problem?

balhoff commented 1 year ago

@kltm which bit of software should be doing the shortening? Is Minerva sending it out uncompacted? It looks to be correctly in one of the built in context files: https://github.com/geneontology/minerva/blob/14f8f0cdaae608d561ac103ef2847743ac491c4e/minerva-core/src/main/resources/go_context.jsonld#L145

kltm commented 1 year ago

@balhoff Good questions! This is autocomplete, the the issue is on the NEO side of the fence. In the "neo.obo" file, these identifiers are still uncompressed (unCURIEed?), meaning that it's in the build somewhere. I'd note the owltools gets involved in there. So the order is something like:

get upstream tair: https://www.arabidopsis.org/download_files/GO_and_PO_Annotations/Gene_Ontology_Annotations/gene_association.tair.gz (seems correct)
run through gaf2obo.pl to produce neo-tair.obo (seems correct)
use owltools to make neo.obo from the sources like neo-tair-.obo (incorrectly CURIEed)

I'm not sure if there is something "off" about the neo-tair.obo already or if this is an owltools or other type of problem. I assume there's a way to do this w/o owltools too?

balhoff commented 1 year ago

The identifiers are always incorrect in neo.obo. owltools converts the obo to owl, and then the perl script does text substitution to fix the IRIs. I do see what you mean that they aren't compacted in the obo file, but I'm not sure that's a problem. Once it gets to the owl file, the IRIs look correct (I just downloaded and looked at the owl). I think that a downstream tool which is reading the owl file may be missing the CURIE definition. Is that possible?

kltm commented 1 year ago

@balhoff That would unfortunately point back to owltools, which is in charge of loading the ontology for Solr here. I'm not sure why TAIR/AGI_LocusCode is the /one/ case here however; moreover in that it's not wrong, but uncompressed. Weird.

A couple of greps through the owltools code hasn't turned up anything yet.

I would note that the same things do work in "normal" Solr loads with owltools, so it would have to be particular to the ontology handling if it was there.

balhoff commented 1 year ago

@kltm I did some more testing; I think the core problem is the prefix contains an underscore, although I thought that used to be fine in OBO format. I better understand now that I guess the Solr load really does use the obo file, not the owl file. So we can either change the prefix to be one without an underscore, or try to make a fix in OWL API (long process).

kltm commented 1 year ago

@balhoff To clarify, unless I'm missing something (always possible), owltools is using OWL files to load Solr: https://github.com/geneontology/pipeline/blob/issue-35-neo-test/Jenkinsfile#L84-L85 . I agree that the underscore seems likely. It is in the spec though, right? https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-NCName

balhoff commented 1 year ago

Thanks for clarifying; I have a potential fix in #113.

kltm commented 1 year ago

Compaction now seems to be working on production.

geneontology / neo

unresolved IDs #106