Closed mwalzer closed 2 years ago
Previously, when pronto
processed the CV with the import
declaration for NCIT, did it also download and parse all of NCIT and add it to the term graph? We can remove those lines, but it will start to run into implementation details of whatever tools people use to consume the CV.
Yeah, I was wondering whether this would become a problem. Stripping the ancestors from the terms would be one solution, although it feels a bit like we're changing the terms inappropriately.
Another possibility is to include the ancestors up to the top, NCIT isn't very deep, so maybe adding a handful of additional terms from NCIT could retain the terms as they were and make everything complete?
@mobiusklein yes it did. Now it is of course choking on the missing ancestor definitions. @edeutsch maybe if we replaced the ancestry with a relation or property that signifies "Hey, this is a placeholder/symlink/dummy, if you want the whole context look there." it would feel less bad?
ah, hmm, I don't think that really helps. It is still altering the terms in a way that conflicts with the primary definitions.
@mwalzer If we were to restore the import and have the term re-declaration (because I don't know enough about how pronto
works and whether it can merge multiple term graphs outside of the parsing phase), would the stripped down, re-declared term be the one you saw when you looked it up, the original term, or the union of the two (e.g. suppose we tagged the re-declared term with a "has_property: redeclared_for_science" line, is the term in memory both complete and marked with this property).
If it is the union, then we're fine stripping out all the lines that reference beyond PSI-MS for the re-declared terms because we can assume the consumer will unify them. If not, we can't use re-declaration as a method of side-stepping depending upon NCIT.
Admittedly this may be an implementation detail still, but we can check with other common tools (ROBOT, Protege) too. We know ROBOT didn't error out because it accepted the current OBO file when generating an OWL file for us, but it may not have been trying to validate the ontology during conversion.
Bit late to the party here, but I just tried to parse the CV with our code with these NCITs added and I'm stuck on the fact that the NCIT terms are not declared as namespaces in the header like MS and PEFF are. Is that intentional or just an oversight?
It's intentional. It is a re-declaration to avoid importing NCIT completely, and it passes validation in two OBO implementations, and was recommended here: https://stackoverflow.com/questions/47695745/referring-to-a-concept-in-a-not-imported-ontology,
The discussion hashing this out started in #105
My apologies, I didn't have the opportunity to test the changes from #109 (using pronto). This was partly due to a unit.obo issue that breaks loading of psi-ms.obo in pronto and needed a custom debug obo created. With that successfully circumvented, I realised that the re-declarations contain
is_a
statements that themselves contain NCIT terms unknown. https://github.com/HUPO-PSI/psi-ms-CV/blob/d4d873eb049b7af007a4a2cdf02af97e216b07c9/psi-ms.obo#L89Works fine without them, so can we ditch the
is_a
in these cases or would that make the concept re-declaration somehow invalid?