HUPO-PSI / psi-ms-CV

HUPO-PSI mass spectrometry CV
Other
26 stars 36 forks source link

NCIT re-declaration (#109) works fine - almost #120

Closed mwalzer closed 2 years ago

mwalzer commented 2 years ago

My apologies, I didn't have the opportunity to test the changes from #109 (using pronto). This was partly due to a unit.obo issue that breaks loading of psi-ms.obo in pronto and needed a custom debug obo created. With that successfully circumvented, I realised that the re-declarations contain is_a statements that themselves contain NCIT terms unknown. https://github.com/HUPO-PSI/psi-ms-CV/blob/d4d873eb049b7af007a4a2cdf02af97e216b07c9/psi-ms.obo#L89

Works fine without them, so can we ditch the is_a in these cases or would that make the concept re-declaration somehow invalid?

mobiusklein commented 2 years ago

Previously, when pronto processed the CV with the import declaration for NCIT, did it also download and parse all of NCIT and add it to the term graph? We can remove those lines, but it will start to run into implementation details of whatever tools people use to consume the CV.

edeutsch commented 2 years ago

Yeah, I was wondering whether this would become a problem. Stripping the ancestors from the terms would be one solution, although it feels a bit like we're changing the terms inappropriately.

Another possibility is to include the ancestors up to the top, NCIT isn't very deep, so maybe adding a handful of additional terms from NCIT could retain the terms as they were and make everything complete?

mwalzer commented 2 years ago

@mobiusklein yes it did. Now it is of course choking on the missing ancestor definitions. @edeutsch maybe if we replaced the ancestry with a relation or property that signifies "Hey, this is a placeholder/symlink/dummy, if you want the whole context look there." it would feel less bad?

edeutsch commented 2 years ago

ah, hmm, I don't think that really helps. It is still altering the terms in a way that conflicts with the primary definitions.

mobiusklein commented 2 years ago

@mwalzer If we were to restore the import and have the term re-declaration (because I don't know enough about how pronto works and whether it can merge multiple term graphs outside of the parsing phase), would the stripped down, re-declared term be the one you saw when you looked it up, the original term, or the union of the two (e.g. suppose we tagged the re-declared term with a "has_property: redeclared_for_science" line, is the term in memory both complete and marked with this property).

If it is the union, then we're fine stripping out all the lines that reference beyond PSI-MS for the re-declared terms because we can assume the consumer will unify them. If not, we can't use re-declaration as a method of side-stepping depending upon NCIT.

Admittedly this may be an implementation detail still, but we can check with other common tools (ROBOT, Protege) too. We know ROBOT didn't error out because it accepted the current OBO file when generating an OWL file for us, but it may not have been trying to validate the ontology during conversion.

chambm commented 2 years ago

Bit late to the party here, but I just tried to parse the CV with our code with these NCITs added and I'm stuck on the fact that the NCIT terms are not declared as namespaces in the header like MS and PEFF are. Is that intentional or just an oversight?

mobiusklein commented 2 years ago

It's intentional. It is a re-declaration to avoid importing NCIT completely, and it passes validation in two OBO implementations, and was recommended here: https://stackoverflow.com/questions/47695745/referring-to-a-concept-in-a-not-imported-ontology,

The discussion hashing this out started in #105