HumanCellAtlas / ontology

3 stars 1 forks source link

Adding spatial transcriptomics under library preparation [ENQ] #118

Closed idazucchi closed 2 years ago

idazucchi commented 2 years ago

Hi @zoependlington @bvarner-ebi I'm not sure if you are ready to start integrating new terms into the HCAO, but I would like your opinion on this in the meanwhile.

We use the terms under library preparation to indicate how the library was prepared. The only term for Spatial transcriptomics available under library preparation is Visium, which covers most but not all of the datasets we have encountered. Would it be possible to get a generic Spatial transcriptomics term under library preparation, or even a branch with the different techiniques?

@ESapenaVentura to keep you updated

ghost commented 2 years ago

Hi, @idazucchi, would using the class EFO:0008994 'spatial transcriptomics' address your use case?

idazucchi commented 2 years ago

The term you are suggesting would be perfect as a general term, but we are restricted to using terms under library preparation. Right now spatial transcriptomics is in the HCAO but it's under assay, would it be possible to move or add it under library preparation?

ghost commented 2 years ago

Hi, @idazucchi, would it have to be related with an "is a" relationship, i.e., 'spatial transcriptomics' subclass of 'library preparation'? I don't know if that statement is accurate.

Would 'spatial transcriptomics' part of 'library preparation' work, assuming that axiom is true?

idazucchi commented 2 years ago

Hi @bvarner-ebi, The validation we do is based on SubclassOf so that is what we would need. I think that it also makes sense as a relationship since the spatial trascriptomics techniques are a kind of library preparation.

Is this possible? Could this be part of the next HCAO release or would it need more time?

ghost commented 2 years ago

Hi, @idazucchi, thanks for the reply.

Pursuant to the comments above and after getting some additional feedback from @dosumis, I don't think it makes logical sense to place 'spatial transcriptomics' under 'library preparation'. Based on the definitions, 'spatial transcriptomics' does not seem to be a subclass of 'library preparation', but rather 'library preparation' is part of 'spatial transcriptomics' (I initially had this reversed).

It may make sense to approach this by addressing "we are restricted to using terms under library preparation". Otherwise, it seems like we would have to model terms inaccurately to meet your use case, which can lead to additional downstream effects for anyone using EFO.

ESapenaVentura commented 2 years ago

@bvarner-ebi , I understand your concerns, but just to go back and check on what we want:

The suggestion @idazucchi made above was because we had an internal discussion about what would be the best way to implement this new term we used, but are happy to discuss alternatives.

Currently, I don't understand why moving that term under library preparation (https://ontology.archive.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0000711) in the EFO slim of the HCAO would not be correct. We have several terms, including NanoString Digital Spatial profiling (https://ontology.archive.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0030029) that are direct childs of that term in that branch. Could you explain why it would not be ok to consider, e.g. spatial transcriptomics by high-throughput sequencing (https://ontology.archive.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0030005) as a term under library preparation?

And, if it really is incorrect, does that mean every term under library preparation that aludes to specific instances of each library preparation protocol (e.g. the NanoString one I mention above) be wrong?

(Also, it may be a case that spatial transcriptomics is too wide but spatial transcriptomics by high-throughput sequencing is not?)

Apologies for the long comment, I just want to make sure that we're coming across clear with our intentions and that we get the best possible outcome of this ticket!

ghost commented 2 years ago

Hi, @ESapenaVentura,

Thank you for the information. Hopefully we can come up with a working solution.

We have a dataset that used a non-general library preparation method, (non-visium). We want to describe the method for the library preparation protocol

This may be the point of confusion: my interpretation of the definition 'library preparation' is different than 'library preparation method' or 'library preparation protocol'.

It looks like 'library preparation' was copied from Ontology for Biomedical Investigations to EFO and then used with a different meaning in EFO (compare modelling in OBI). EFO has assays placed as subclasses under 'library preparation', and that is the part that does not seem correct. My understanding is that these assays have a step for 'library preparation' in their workflows. If this is the case, I do not think it is correct to model assays as children (subclasses of) 'library preparation'.

Could you explain why it would not be ok to consider, e.g. spatial transcriptomics by high-throughput sequencing (https://ontology.archive.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0030005) as a term under library preparation?

Following from my comments above, I would say it is logically incorrect to say 'spatial transcriptomics by high-throughput sequencing' is a subclass of 'library preparation' if 'library preparation', as defined by the text definition, is actually a process that is one of many steps in 'spatial transcriptomics by high-throughput sequencing' (or any other assay).

The statement above: "since the spatial trascriptomics techniques are a kind of library preparation" -- this is where our perspectives differ.

If my conceptualisation of these classes is wrong, please do advise. If the perspective I outlined seems reasonable, may I suggest instead of pulling from children of 'library preparation', you pull from 'assay', as that modelling is less controversial. The remodelling of 'library preparation' in EFO could then be addressed at some point in time. Is that a reasonable way to proceed?

CC @dosumis

rays22 commented 2 years ago

Would it be possible to get a generic Spatial transcriptomics term under library preparation, or even a branch with the different techiniques?

I think it would be a good solution to add a new generic spatial transcriptomics term under library preparation. See my explantation below. I hope this helps.


Library preparation vs. assay

Are the classes above meant to be disjoint? Are there any experimental processes that have multiple specified outputs, for example, both new biomaterials and information about the original biomaterials? Depending on your answer to these questions, your modelling of these processes can differ.

I think these terms are meant to be disjoint in HCA usage. If library preparation is a sample preparation process, then a library preparation term is not an assay, and that it why you can not use an assay-term from EFO in HCA library preparation-related metadata. @ESapenaVentura please confirm this.

The EFO:0008994 spatial transcriptomics term is defined as an assay in EFO. As the OBI and HCA definitions of assay appear to agree, and assuming that the HCA process types are meant to be disjoint, then it would be misleading and incorrect to include EFO:0008994 spatial transcriptomics as a subclass of OBI:0000711 library preparation.

If there is actually any assay-type term included as a subclass of EFO/OBI:0000711 library preparation, then those should be double checked. If the specified output of a process (or a named series of processes) to which the term refers may include both information (file/files) and biomaterial, then the classification may not be correct for the HCA (assuming disjointness of terms 1-2).

I think if you add a new library preparation for spatial transcriptomics class under OBI:0000711 library preparation in EFO that is clearly not an assay (the specified output is a biomaterial, not some data item), and follow a similar path for any other library preparation process term, even if a similar sounding assay-type term exists, then you have a solution that is logically coherent (without any major EFO/OBI restructuring) and would also work for the HCA.

ghost commented 2 years ago

Thank you very much for the insight, @rays22.

If I follow correctly, you are suggesting that we create a new class called 'library preparation for spatial transcriptomics' as a subclass of 'library preparation'. HCA can then select library preparation protocols, but not assays, to be children of 'library preparation for spatial transcriptomics'. Also, it sounds that it would be prudent to remove assay terms that are currently under 'library preparation'.

@idazucchi, would that work for your use case?

rays22 commented 2 years ago

If I follow correctly, you are suggesting that we create a new class called 'library preparation for spatial transcriptomics' as a subclass of 'library preparation'.

Yes, that is what I am suggesting.

HCA can then select library preparation protocols, but not assays, to be children of 'library preparation for spatial transcriptomics'.

Just to clarify, HCA are allowed to select terms exclusively under 'library preparation' for their sample/library preparation type metadata annotations. If they pick an assay-type term, for example EFO:0008994 'spatial transcriptomics', then they get a schema validation error. Correct me @idazucchi or @ESapenaVentura if I am wrong here.

Also, it sounds that it would be prudent to remove assay terms that are currently under 'library preparation'.

It depends. I think it may not be as urgent as adding the new library preparation term(s), because that seems to be a blocker for the HCA.

Again, correct me if I am wrong.

ghost commented 2 years ago

I also very kindly received the following input from the author of 'library preparation':

"... I agree with the suggestion of having a class ‘library preparation for spatial transcriptomics’ created, assuming that the output of this process is a ‘library’ (ie a material entity) that will be input to a ‘sequence data acquisition’. and as far as I recall, we (OBI) did not set disjointness explicitly between these classes (assay & ‘library preparation’). One question I ’d have is if there is a need to explicitly identify the ‘library material’ (the output of the library preparation process), for instance if these are logged in a ELN and stored in a -80C prior to loading onto a flowcell. hth."

@idazucchi, would you like to proceed with submitting a new term request for 'library preparation for spatial transcriptomics' if that will meet your use case?

idazucchi commented 2 years ago

Hi @bvarner-ebi and @rays22, thank you for your suggestions! We need a bit more time to discuss this within our team, is it ok if I get back to you next week?

ghost commented 2 years ago

Sure, @idazucchi, absolutely. There is no blocker on my end.

idazucchi commented 2 years ago

Hi @bvarner-ebi, Thank you for your patience. We've had a discussion and we would be happy to have library preparation for spatial transcriptomics as a subclass of library preparation.

In this case would you then add Visium, Slide-seq, Slide-seqV2, NanoString digital spatial profiling or a library preparation term for them under library preparation for spatial transcriptomics ?

ghost commented 2 years ago

Great, @idazucchi. Would you kindly submit a "new term request" issue, and I will add the term accordingly.

ghost commented 2 years ago

New term created (#121).