OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
161 stars 201 forks source link

Principle #2 common format - automated validation #1018

Open beckyjackson opened 5 years ago

beckyjackson commented 5 years ago

FP 2 - Common Format

Automated checks:

  1. The OWL PURL must resolve to RDF/XML

Mechanism:

We can ensure that the ontology properly loads in ROBOT, but this does not confirm that the format is RDF/XML. Unfortunately, it seems like the format data is lost after the ontology is loaded with the OWLAPI OWLOntologyManager. We can check the first line of the file to see if it starts with <?xml version=. I'm open to other suggestions here.

balhoff commented 5 years ago

Unfortunately, it seems like the format data is lost after the ontology is loaded with the OWLAPI OWLOntologyManager.

@beckyjackson have you tried this method?

http://owlcs.github.io/owlapi/apidocs_4/org/semanticweb/owlapi/model/OWLOntologyManager.html#getOntologyFormat-org.semanticweb.owlapi.model.OWLOntology-

beckyjackson commented 5 years ago

Yes, unfortunately it returned null after loading with the ROBOT IOHelper 🙁 Do you know a way to keep that information @balhoff ?

balhoff commented 5 years ago

No, sorry, I thought that would work! It's not something I have used before though.

beckyjackson commented 5 years ago

The root node should be rdf:rdf - @jamesaoverton

cmungall commented 5 years ago

Jena?

I think at a minimum it should parse using Jena, which gives us a guarantee that it's in some RDF format, which is a more pragmatic requirement than the stricter RDF/XML IMHO

On Thu, Aug 15, 2019 at 2:27 PM Becky Jackson notifications@github.com wrote:

The root node should be rdf:rdf - @jamesaoverton https://github.com/jamesaoverton

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1018?email_source=notifications&email_token=AAAMMOL3JOPX5M5R6XVHWKDQEXC5DA5CNFSM4IKVL6SKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4NBQ4I#issuecomment-521803889, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOPFBR2CCVRUVRJKF3TQEXC5DANCNFSM4IKVL6SA .

jamesaoverton commented 5 years ago

Even though RDF/XML is not my preferred format, I like the predictability of the strict rule.

cmungall commented 5 years ago

Wait, I just thought, surely we can use the owlapi with only the rdfxml parser registered? If it fails, then it's not valid rdfxml

On Mon, Aug 19, 2019 at 8:00 AM Chris Mungall cjmungall@lbl.gov wrote:

Jena?

I think at a minimum it should parse using Jena, which gives us a guarantee that it's in some RDF format, which is a more pragmatic requirement than the stricter RDF/XML IMHO

On Thu, Aug 15, 2019 at 2:27 PM Becky Jackson notifications@github.com wrote:

The root node should be rdf:rdf - @jamesaoverton https://github.com/jamesaoverton

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1018?email_source=notifications&email_token=AAAMMOL3JOPX5M5R6XVHWKDQEXC5DA5CNFSM4IKVL6SKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4NBQ4I#issuecomment-521803889, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOPFBR2CCVRUVRJKF3TQEXC5DANCNFSM4IKVL6SA .

cmungall commented 4 years ago

This check should be expanded: the base ontology should be rdf/xml

However, we should also mandate that any imported ontology be at least some RDF format (turtle or xml).

Some ontologies have an .obo format file in their imports and this causes problems with OWL-based toolchains like Owlready2 (reported by SciBite cc @simonjupp )

For more on this particular instance of the issue: https://github.com/HUPO-PSI/psi-ms-CV/issues/26

I am not sure how best to implement this in robot/owlapi

matentzn commented 4 years ago

This check should be expanded: the base ontology should be rdf/xml

I never knew we were going that way, but it makes sense with Jena in mind. Is there any hope to propose to merge release ontologies in general? having imports makes proper versioning really difficult to manage..

wdduncan commented 4 years ago

Update. @cmungall requirement seems necessary. Is there another way to validate imports in a different format.
We now have a dependency on how to verify the format of imports.

cc @bpeters42

ramonawalls commented 4 years ago

Does the file really need to be RDF/XML? Would not other OWL RDF formats be acceptable? (Maybe this should be a separate issue.)

balhoff commented 4 years ago

@ramonawalls related discussion: #360