Closed mshadbolt closed 4 years ago
(tl;dr: We need to solve the problem of obsolete ontology terms in our metadata either by versioning or by automatic updates.)
Given what was discussed about the mouse strain:
This one is for @simonjupp and @zoependlington: Is there any way to know when a term is obsolete programmatically? (Such as a changelog for each ontology release, etc something that we could look at every month)
This one is for @HumanCellAtlas/data-ops: For this time, re-ingestion or manual AUDR is acceptable, but we have to remind ourselves that data curation (At least, to this extent) won't be maintained in the future of the HCA. So, in this case, we will replace manually, but we need to have an "automatic" solution for the future:
@mshadbolt the OBI term is the same in HCA as it is in EFO (see here: https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0000869) as EFO has a static OBI import. We are gradually working on making all EFO imports dynamic and so this will be rectified at that point. However, if it is crucial, I'm happy to fix this in EFO right now. Just let me know!
As for EFO_0004472, this obsoletion should be in the latest version of HCAO... It looks like the OLS instance hasn't been updated to the latest release. I'll see what's happening there.
Any term obsoleted in EFO will be in the EFO release notes, but I can always throw together a change log for HCA ontologies specifically if that would be useful @ESapenaVentura? Additionally, all ontology terms that are obsoleted have the owl:Deprecated true annotation, which could potentially be exploited as a test?
Thanks for the prompt answer @zoependlington !
What I am looking for is some kind of list of changes from one version to the next one, so that instead of having to look through all the ontology terms that we use for each dataset and look if they are obsolete, we can perform automatic tests as:
if <x in changelog> is <deprecated in this release> and <x in changelog> could be found in schema:
- Search all (e.g.) specimen from organism bundles to see if that ontology is present
- Replace data when possible
- If can't be automatically replaced, warn whoever is responsible to change it
Of course, automatic replacement of terms is in the far future, but if we have something set up that could throw a warn about this, it's a first step towards that.
That is kind of my idea of why I was asking about a changelog. I don't think it's fair to throw more pressure on the ontology team, unless it's super easy I would advise against it.
Where are the release notes? We can maybe work from there
@ESapenaVentura I can generate an ontology diff like this. Would this be useful? I can do it for all elements of the HCAO release (hcao.owl, efo_slim.owl etc.). Classes that are deleted from the efo_slim.owl are the ones that are obsoleted in the full efo.owl and consequently not picked up by the automatic slimming process.
@mshadbolt @ESapenaVentura A small update: The OLS instance has been updated now, so you should no longer see the EFO_0004472 term.
@zoependlington that looks amazing! Where would that document live?
Also, thanks for the hard work!
@ESapenaVentura I can create a new folder in the ontology repo for the diffs to live? Or I can attach them as assets to each release. Whichever is more useful for you.
I think the folder in the ontology repo would be amazing. Then we could make the checks from there. Thanks @zoependlington!
@ESapenaVentura I will update this with every release: https://github.com/HumanCellAtlas/ontology/tree/master/src/ontology/diffs
Thanks @zoependlington !!
Recent work by @jahilton and the @HumanCellAtlas/data-ops team has discovered multiple errors in our HCA ontology that has made it inconsistent with official ontologies.
We would need input from @simonjupp and @zoependlington to figure out how to make sure our ontology is kept up to date.
A couple of examples are:
OBI:0000869
In the HCA ontology instance the label is polyA RNA https://ontology.staging.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0000869
In the EBI OLS for EFO the label is polyA RNA extract https://www.ebi.ac.uk/ols/ontologies/obi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0000869
EFO:0004472
In the HCA OLS it is a valid term https://ontology.staging.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0004472
In the EBI this term is listed as obsolete https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0004472