FAIRplus / FAIRPlus_squad2

an internal issue tracker (=todo list) for Squad team 2
3 stars 0 forks source link

Transcriptomics metadata template #75

Open JolandaS opened 4 years ago

JolandaS commented 4 years ago

Determine which ontologies to use for transcriptomics data (meta data templates)

PeterWoollard commented 4 years ago

Key transcriptomics related entities for FAIR and some ontologies include

Key searching ontologies

Key searching entities (not ontologies)

JolandaS commented 4 years ago

Define own minimal set of metadata, recommendations. Selection criteria for ontologies used.

daniwelter commented 4 years ago

For disease, I would use MONDO (possibly supplemented with NCIt for cancers) as it is currently the most actively developed, so most likely to respond quickly to any change requests. I definitely wouldn't use MeSH. Agreed on all the other ontologies. I'd also add

Searching entities Again, agreed on most of the suggestions. Metabolites - MetaboLights compound accession, ChEBI

AlasdairGray commented 4 years ago

Define own minimal set of metadata, recommendations. Selection criteria for ontologies used.

Bioschema's may be an appropriate approach here to define a minimal metadata record that would be searchable on the web.

karsten-quast commented 4 years ago

I tried to compile a potential starting point for a recipe. Hope it makes sense to you. Really looking forward to your thoughts. Maybe we can flesh this out.

Task

Define competency questions

Defining Minimal Set Of Metadata (MSOM) according to these questions

Introducing semantics into the template

Reality check

FuqiX commented 4 years ago

Link to recipe https://hackmd.io/@7GH6ArIbRnm_7fgcv8mmWw/HJVQ7nHKL

Chris-Evelo commented 4 years ago

I think this would benefit from some structure for an actual study that involves transcriptomics data. Apart from general metadata (who did it, where, where was it stored and so on), this should have a description of the study (which includes what other measurements were done in the same study), this should follow the ISA principles. How samples were created and how the actual measurements were performed. Next, it should also link (and have an ontological description) of 1) parallel measurements (like did you also do proteomics and where do I find that info). 2) phenotypic outcome data. Like under the treatment in the study the data that was measured was blood pressure and so on, and again where you would store that. Note that, ideally, in a public study, the ISA types of data would go into Biosamples, and the other measurements would be in Biostudies, or (for other comics data) be linked from there. So our choices should ideally align with how these repositories (and of course Arrayexpress and GEO) work. (Sorry if all that was already in the cookbook)

Chris-Evelo commented 4 years ago

We had some discussion about whether this could not better be part of the catalogue model. Of course, the catalog needs to align with how data is collected. But we need to also make sure of our recipes align with a "FAIR at source" approach where people can start to collect the relevant data when they design, perform and evaluate the actual study.