Closed balhoff closed 1 month ago
@balhoff I want to clarify what's going on here. Is this that the intermediate is ofn, then gets converted to obo or owl at the end. This seems to be true, as I still see the "neo.obo neo.owl" targets, but want to confirm.
@kltm that is correct. I ran it locally works like a charm (or as much as perl can ever be charming)
@kltm I think this is good to go but diffing the two outputs (previous and new) is challenging. Some known changes:
PseudoCAP terms (clear fixed problem):
<http://purl.obolibrary.org/obo/PseudoCAP_PA0001>
<http://identifiers.org/pseudomonas/PA0001>
RefSeq terms (clear fixed problem):
<http://purl.obolibrary.org/obo/RefSeq#_NM_001001130>
<http://identifiers.org/refseq/NM_001001130>
TAIR locus terms (change due to using only simple prefix substitution rather than regex):
<http://identifiers.org/tair.locus/1005028051>
<http://identifiers.org/tair/locus:1005028051>
My understanding is that @cmungall says go ahead with the TAIR change and continue working with them on IRI structures.
@balhoff Just to clarify, would a change in IDs(!) require a scan and transition of all models that already have the old ones? Or has this been done and these have not been (widely) used?
@kltm I did some grepping:
PseudoCAP
does not appear in Noctua modelsRefSeq
appears only in string CURIE form like <http://model.geneontology.org/MGI_MGI_1921620/f20b4303-9057-44e9-946d-be9fd4a6d60c> <http://geneontology.org/lego/evidence-with> "RefSeq:NP_065812"
tair.locus
is found, e.g., <http://identifiers.org/tair.locus/2157383>
@kltm I fixed the TAIR identifier problem. I also regenerated Makefile-gafs using a new datasets.json, which fixed my wb download issues. Additionally, I added a prefix for EcoCyc, which was a new issue uncovered after updating datasets.json. EcoCyc
does not seem to appear in Noctua models. I used the URL expansion from Bioregistry, as EcoCyc does not appear to have an entry in identifiers.org.
Okay, just to tag on @vanaukenk to this conversation:
@balhoff if that is correct and @vanaukenk you are okay with this from our conversation yesterday, I will merge (and then do an internal test run).
Thanks @balhoff and @kltm Okay with me to merge and do an internal test run.
This PR is an alternative to #119, and would fix https://github.com/geneontology/noctua/issues/902.
In this repo we have had a few different hacks related to prefixes and OBO format. In #119 I updated things to use the new OBO support for prefixes, but still ran into a few corner cases. Here I just updated the text output from the perl scripts to write OFN instead of OBO. This gives us much more control over the exact OWL we end up with, and has reliable handling of prefixes. Also, now the OBO file is a terminal output and doesn't need to round-tripped within this build.
I think the only remaining issues are those with TAIR IDs mentioned in #119. But I do still need to do a careful diff. @cmungall suggests taking a simple prefix-based approach to the TAIR IDs (as done here) since the IRIs aren't used currently in Noctua models anyway.