HumanCellAtlas / ontology

3 stars 1 forks source link

Update pipeline to use all of Uberon Euarchontoglires slim #84

Open dosumis opened 3 years ago

dosumis commented 3 years ago

The current filtering system relying on FMA xrefs is unsustainable & does not work for new terms with no FMA xref. If the aim of filtering is to remove irrelevant terms, a taxon slim is a much better fit - as these work by excluding terms that are explicitly recorded as being outside the slim, rather than a positive assertion of inclusion. The current mapping of FMA labels can stay in place.

@matentzn is working on a new release of euarchontoglires-basic.owl that should be sufficient & should be available shortly.

paolaroncaglia commented 3 years ago

@dosumis See also the placeholder ticket I had opened previously, https://github.com/HumanCellAtlas/ontology/issues/80. I'd close that ticket as a duplicate, unless we want to keep it to be reminded of generating a CL slim based on the Uberon one. (I could just remove the Uberon bit from #80 .) Thanks. (Update: #80 is now a reminder to import the CL slim when available.)

paolaroncaglia commented 3 years ago

Update: the latest Uberon release(s) contain species subsets, so

paolaroncaglia commented 3 years ago

Update: we plan to import the human-focused (Euarchontoglires) Uberon slim in the mid-October HCAO release.

matentzn commented 3 years ago

Best assign someone like @dosumis to really sort out this slim with plenty of time. It looks ok, but there are still a few classes that need taxon constraints.

dosumis commented 3 years ago

We will move to this as-is and work on adding taxon constraints to improve in future. In discussion with HCA earlier this year we agreed to move from a conservative approach, trying to identify Human relevant terms, to a liberal one of trying to exclude non-mammalian terms. This doesn't need to be perfect and we can work to rapidly improve given increased curator resources.

paolaroncaglia commented 3 years ago

@matentzn @dosumis yep, no need to be perfect to start with, but I was planning to take a closer look at the slim next week anyway, in case anything glaringly non-human comes up. No deep investigation, just a half-hour or so. Could one of you please remind me what the slim file is named exactly? Is it composite-vertebrate in https://github.com/obophenotype/uberon/releases? Thanks.

dosumis commented 3 years ago

https://github.com/obophenotype/uberon/blob/v2021-07-27/subsets/euarchontoglires-basic.owl

dosumis commented 3 years ago

I'm sure there will be some glaringly invertebrate terms in there. Can you list any you find on a ticket? We should be able to quickly remove them by adding taxon constraints.

paolaroncaglia commented 3 years ago

I'm sure there will be some glaringly invertebrate terms in there. Can you list any you find on a ticket? We should be able to quickly remove them by adding taxon constraints.

https://github.com/obophenotype/uberon/issues/2050

paolaroncaglia commented 3 years ago

Update: this has progressed but depends on https://github.com/obophenotype/uberon/issues/2050#issuecomment-932378372, that @rays22 will look into (with help from David if needed). Thanks.

paolaroncaglia commented 2 years ago

Update: the Uberon Euarchontoglires slim is ready, but still needs improving before it can be added to the HCAO pipeline.

paolaroncaglia commented 2 years ago

Update: depends on https://github.com/obophenotype/uberon/issues/2127 and https://github.com/obophenotype/uberon/issues/2183; https://github.com/obophenotype/uberon/issues/2194 is also related, but not a blocker.

Update Feb 8th 2022: at a recent meeting (Jan 18th) David mentioned that "Anita is working on a new strategy that's looking promising. Likely to take at least a month to implement.", so I'll follow up after mid-Feb.

paolaroncaglia commented 2 years ago

Update 15/3/22: no update on this yet, but I put it down for discussion at my next 1:1 with David.