HumanCellAtlas / ontology

3 stars 1 forks source link

[ENQ] UBERON:0007795 not in HCAO #78

Closed mshadbolt closed 3 years ago

mshadbolt commented 3 years ago

One of our wranglers @willrockout is trying to use the term UBERON:0007795 in our metadata but even though it exists in the EBI OLS version of UBERON it can't be found in the HCA OLS .

Is this because it is a new term and it will be in the next release or is there some other reason why this term hasn't been imported into HCAO?

thanks!

paolaroncaglia commented 3 years ago

Hi @mshadbolt and @willrockout ,

Not all UBERON terms are automatically imported into HCAO. The current makefile is designed to import UBERON terms that a) have a cross-reference to an FMA term, or b) are manually added to a list of terms to be imported, or c) are tagged with the new subset property "added_for_HCA" (this last is used when I create a term in Uberon, CL or EFO that was requested by HCA curators). UBERON:0007795 'ascitic fluid' doesn't fall in any of the 3 categories unfortunately. (The 3 relevant files containing current lists for a), b) and c) can be found at the bottom of this page: https://github.com/ebi-ait/ontology/tree/master/src/ontology).

We'll be happy to add UBERON:0007795 'ascitic fluid' to uberon_manual_import.txt, but we just released HCAO earlier today and the next release is scheduled for April 19th. If you need the term sooner, @zoependlington kindly offered to run a patch release. Please let us know what you prefer.

Thanks, Paola (and Zoe)

willrockout commented 3 years ago

Hi @paolaroncaglia,

Thank you for getting back to us so quickly if we can get @zoependlington to do a patch change that would be very much appreciated. Do you know the rough timeframe for that change to go into effect?

Thanks, Will

paolaroncaglia commented 3 years ago

Hi @willrockout ,

Thanks for letting me know. I'll check with @zoependlington tomorrow and will let you know.

Best wishes, Paola

mshadbolt commented 3 years ago

hi again, I thought I would hijack this thread for my own purposes now, what are the rules around CL imports? Trying to work out why T follicular helper cell https://www.ebi.ac.uk/ols/ontologies/cl/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCL_0002038 isn't in our ontology either.

Seems pretty weird that the T cell hierarchy is different between the two: EBI OLS Screenshot 2021-03-19 at 16 05 24 HCAO OLS Screenshot 2021-03-19 at 16 05 15

paolaroncaglia commented 3 years ago

Hi @willrockout and @mshadbolt ,

The CL import into HCAO works similarly to Uberon given the current makefile. Not all CL terms are imported into HCAO; only CL terms that have an FMA cross-reference get imported automatically, and those that don’t have FMA x-refs don’t get picked up and need to be added manually or with the added_for_HCA tag. T follicular helper cell doesn't have an FMA xref. I suspect that there will be many more CL terms that you'll need and that don't have an FMA xref. Same for Uberon terms. If that's the case, Zoe suggests that she may hold off on a patch release until next week when we can collect together all of the terms needed and make sure they get imported into HCAO? Otherwise, she might be able to get you a patch release by Monday (could be Tuesday your time Marion).

We'll wait to hear from you, Thanks, Paola and Zoe

mshadbolt commented 3 years ago

Ok thanks @paolaroncaglia

It seems like we need a better way to automatically determine what terms get imported into HCAO as it isn't really meeting our needs at the moment and the lead time to request imports and then wait for release will slow down our submission processes.

Would there be any harm in importing the entire CL ontology? Or will it bloat the HCAO too much?

@willrockout are you aware of any other examples where the UBERON term you wanted to use wasn't in HCAO?

paolaroncaglia commented 3 years ago

@mshadbolt Zoe and I reviewed and discussed the options that we think are available to you for CL (and possibly Uberon) terms. Unfortunately I have another meeting now, so I'll have to get back to you tomorrow with more details; but rest assured, we're giving this thoughtful consideration, and are confident that at least one of the options we'll suggest will meet your needs. Best, Paola

paolaroncaglia commented 3 years ago

@mshadbolt (cc @dosumis for his information)

It seems like we need a better way to automatically determine what terms get imported into HCAO as it isn't really meeting our needs at the moment and the lead time to request imports and then wait for release will slow down our submission processes.

Would there be any harm in importing the entire CL ontology? Or will it bloat the HCAO too much?

@willrockout are you aware of any other examples where the UBERON term you wanted to use wasn't in HCAO?

Here's a summary of my discussion with @zoependlington .

Zoe is drafting a document that illustrates the current HCA pipeline and explains what terms currently do or don't get imported in HCAO. Same as what we explained in comments above, but inclusive of all imports and could be made available to all HCA wranglers once completed.

Importing the whole of the CL ontology in an application ontology like HCAO kind of defies the purpose of an application ontology. It'd be better to refer to CL separately, the same way as I understand you refer to Mondo and EDAM separately, in their entirety. Also consider that, if we imported all ~2,500 CL terms in HCAO, they would probably come with all the externally imported terms that they link to (Uberon, GO...) which brings the total to ~10,000 terms.

Irrespective of where you'd store CL or Uberon terms, when you need a term that exists elsewhere but is not in HCAO yet, there's always the option of pre-filling curation fields with IDs. The validation script may be modified so it throws a warning rather than an error - it'll flag that you need to enter a "live" term, but it won't fail. I remember that some Gene Ontology curation tools allowed this.

HCAO is released regularly every month now, and I prioritize HCA requests. So e.g. EFO terms that you request for assays, mouse strains etc. would be available within max 4 weeks of your request, as long as tickets are opened before the 15th of each month. For CL and Uberon terms, if you referred to the source ontologies directly, all terms would be available to you immediately, unless they don't exist yet but those seem to be a minority.

Please let us know if you'd prefer to discuss this further, so we can find the best solution(s) for HCA. Thanks, Paola (and Zoe)

pnejad commented 3 years ago

@paolaroncaglia @zoependlington

Could you also push CL:0002410 (pancreatic stellate cell) to HCAO?

paolaroncaglia commented 3 years ago

@pnejad Sure, I made a note in our (Zoe's and mine) agenda for the next HCAO release, currently scheduled for April 19th (i.e. 2 working days after the EFO release, to bring in any requested new EFO term).

pnejad commented 3 years ago

Thank you @paolaroncaglia!

I have another one we need for HCAO (CL:0005006 - ionocyte). Should I keep adding them to this ticket?

paolaroncaglia commented 3 years ago

Hi @pnejad ,

I have another one we need for HCAO (CL:0005006 - ionocyte). Should I keep adding them to this ticket?

Yes that's fine, you can add terms that need importing to this ticket. I already made a note for CL:0005006 - ionocyte, same as above. Best,

Paola

dosumis commented 3 years ago

Irrespective of where you'd store CL or Uberon terms, when you need a term that exists elsewhere but is not in HCAO yet, there's always the option of pre-filling curation fields with IDs. The validation script may be modified so it throws a warning rather than an error - it'll flag that you need to enter a "live" term, but it won't fail. I remember that some Gene Ontology curation tools allowed this.

I like that idea.

paolaroncaglia commented 3 years ago

Irrespective of where you'd store CL or Uberon terms, when you need a term that exists elsewhere but is not in HCAO yet, there's always the option of pre-filling curation fields with IDs. The validation script may be modified so it throws a warning rather than an error - it'll flag that you need to enter a "live" term, but it won't fail. I remember that some Gene Ontology curation tools allowed this.

I like that idea.

Thanks again to @zoependlington who brought it up first :-)

mshadbolt commented 3 years ago

Irrespective of where you'd store CL or Uberon terms, when you need a term that exists elsewhere but is not in HCAO yet, there's always the option of pre-filling curation fields with IDs. The validation script may be modified so it throws a warning rather than an error - it'll flag that you need to enter a "live" term, but it won't fail. I remember that some Gene Ontology curation tools allowed this.

I like that idea.

Thanks again to @zoependlington who brought it up first :-)

I don't really see how that would work.

We validate against the HCAO (https://ontology.archive.data.humancellatlas.org/index), so as far as our validation script is concerned, no ontology terms exist outside of it. The JSON validator It is also a generic tool that is used by others as well, so we don't really have control over when it throws warnings rather than validation errors.

The other option that was suggested by Will was to validate against the original ontologies directly, i.e. MONDO, EFO, UBERON etc. But then what is the point of having HCAO at all?

Having warnings seems like a slippery slope to having a bunch of terms that aren't in HCAO at all, and maybe not even in the ontology we reference (i.e. CL,UBERON etc.), how would we ensure the terms get added eventually? Also how do we ensure others using our standard know that they have to request terms when there are warnings and not just ignore them since their submission will be 'valid'?

dosumis commented 3 years ago

I think it's worth having a meeting to look in to your OLS build (likely needs updating) and the validations that work from it. These things are easily configured. Maybe some time in early May?

paolaroncaglia commented 3 years ago

Hi @mshadbolt ,

@dosumis , @zoependlington and I discussed a few options. Here’s a summary with action items:

Also, to clarify, in reply to “Having warnings seems like a slippery slope to having a bunch of terms that aren't in HCAO at all, and maybe not even in the ontology we reference (i.e. CL,UBERON etc.), how would we ensure the terms get added eventually?”. We never meant to suggest that you reference non-existing classes, but rather terms that already exist in CL and Uberon and therefore have a valid and stable ID. Other concerns remain valid, of course, and may be addressed at the May meeting as necessary.

Thanks,

Paola

pnejad commented 3 years ago

@paolaroncaglia Could you please add lupus erythematosus (MONDO:0004670) to HCAO?

paolaroncaglia commented 3 years ago

Hi @pnejad ,

Could you please add lupus erythematosus (MONDO:0004670) to HCAO?

My understanding is that HCA looks up the entire Mondo ontology. In fact, using the link I have for the HCAO OLS production instance, I can find lupus erythematosus (MONDO:0004670) as expected. Please let me know if I'm missing something, but I thought that all Mondo terms were available to you?

Thanks, Paola

(Update April 14th: at today's curators meeting we resolved this issue. Parisa had been looking at the public OLS rather than the HCA production instance. The former only shows the application ontology, so no Mondo or EDAM or HANCESTRO.)

paolaroncaglia commented 3 years ago

Ticket summary/notes for self: after next HCAO release,

paolaroncaglia commented 3 years ago

Hi @mshadbolt , @willrockout and @pnejad ,

FYI, @zoependlington made a new release of the HCA ontology today. It will be available in the HCA OLS production instance when this ticket is addressed. The terms that you requested in this ticket will be visible there then: UBERON:0007795 ascitic fluid CL:0002038 T follicular helper cell CL:0002410 pancreatic stellate cell CL:0005006 ionocyte

In fact, all CL terms have been imported in HCAO temporarily while we wait for a human-focused slim version of CL to be available (will take some time). So from now on all CL terms should be immediately available to you. Uberon terms, instead, should still be requested for ad-hoc import, as Uberon is too large to be imported in HCAO. We are, however, actively working on making a human-focused slim version of Uberon available.

Any issue or question please let us know.

Thanks, Paola (and Zoe)

mshadbolt commented 3 years ago

Thanks for all your hard work on this @paolaroncaglia and @zoependlington !