HumanCellAtlas / ontology

3 stars 1 forks source link

Ensure the HCA Ontology is kept up to date with current ontologies #50

Closed mshadbolt closed 4 years ago

mshadbolt commented 4 years ago

Recent work by @jahilton and the @HumanCellAtlas/data-ops team has discovered multiple errors in our HCA ontology that has made it inconsistent with official ontologies.

We would need input from @simonjupp and @zoependlington to figure out how to make sure our ontology is kept up to date.

A couple of examples are:

OBI:0000869

In the HCA ontology instance the label is polyA RNA https://ontology.staging.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0000869

In the EBI OLS for EFO the label is polyA RNA extract https://www.ebi.ac.uk/ols/ontologies/obi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0000869

EFO:0004472

In the HCA OLS it is a valid term https://ontology.staging.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0004472

In the EBI this term is listed as obsolete https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0004472

ESapenaVentura commented 4 years ago

(tl;dr: We need to solve the problem of obsolete ontology terms in our metadata either by versioning or by automatic updates.)

Given what was discussed about the mouse strain:

zoependlington commented 4 years ago

@mshadbolt the OBI term is the same in HCA as it is in EFO (see here: https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0000869) as EFO has a static OBI import. We are gradually working on making all EFO imports dynamic and so this will be rectified at that point. However, if it is crucial, I'm happy to fix this in EFO right now. Just let me know!

As for EFO_0004472, this obsoletion should be in the latest version of HCAO... It looks like the OLS instance hasn't been updated to the latest release. I'll see what's happening there.

Any term obsoleted in EFO will be in the EFO release notes, but I can always throw together a change log for HCA ontologies specifically if that would be useful @ESapenaVentura? Additionally, all ontology terms that are obsoleted have the owl:Deprecated true annotation, which could potentially be exploited as a test?

ESapenaVentura commented 4 years ago

Thanks for the prompt answer @zoependlington !

What I am looking for is some kind of list of changes from one version to the next one, so that instead of having to look through all the ontology terms that we use for each dataset and look if they are obsolete, we can perform automatic tests as:

if <x in changelog> is <deprecated in this release> and <x in changelog> could be found in schema:
- Search all (e.g.) specimen from organism bundles to see if that ontology is present
- Replace data when possible
- If can't be automatically replaced, warn whoever is responsible to change it

Of course, automatic replacement of terms is in the far future, but if we have something set up that could throw a warn about this, it's a first step towards that.

That is kind of my idea of why I was asking about a changelog. I don't think it's fair to throw more pressure on the ontology team, unless it's super easy I would advise against it.

Where are the release notes? We can maybe work from there

zoependlington commented 4 years ago

@ESapenaVentura I can generate an ontology diff like this. Would this be useful? I can do it for all elements of the HCAO release (hcao.owl, efo_slim.owl etc.). Classes that are deleted from the efo_slim.owl are the ones that are obsoleted in the full efo.owl and consequently not picked up by the automatic slimming process.

zoependlington commented 4 years ago

@mshadbolt @ESapenaVentura A small update: The OLS instance has been updated now, so you should no longer see the EFO_0004472 term.

ESapenaVentura commented 4 years ago

@zoependlington that looks amazing! Where would that document live?

Also, thanks for the hard work!

zoependlington commented 4 years ago

@ESapenaVentura I can create a new folder in the ontology repo for the diffs to live? Or I can attach them as assets to each release. Whichever is more useful for you.

ESapenaVentura commented 4 years ago

I think the folder in the ontology repo would be amazing. Then we could make the checks from there. Thanks @zoependlington!

zoependlington commented 4 years ago

@ESapenaVentura I will update this with every release: https://github.com/HumanCellAtlas/ontology/tree/master/src/ontology/diffs

ESapenaVentura commented 4 years ago

Thanks @zoependlington !!