emo-bon / sequencing-data

The files controlling and describing the sequencing metadata
Apache License 2.0
0 stars 2 forks source link

work on the omics terms #3

Open isanti opened 2 years ago

isanti commented 2 years ago

check the Darwin core list (from Katrina) to the MIxS checklist to the list Ioulia has to the info Genoscope is senting us decide on what we keep

kmexter commented 2 years ago

The list of terms to add to the CSV files that make up the DwCA for DNA-based occurrances (with or without actual species detections) are

See also https://docs.gbif.org/publishing-dna-derived-data/1.0/en/ for more on mandatory and some easier to understand explanations and examples See also https://manual.obis.org/examples.html#edna-dna-derived-data for the OBIS view.

ENA: the MIXS terms

kmexter commented 2 years ago

I think the next step is for Ioulia and Katrina to go thru these lists - row by row - and decide which ones are relevant to EMO BON water, sediment, and ARMS; who provides them and where the values for each term can be found; for which destination (ENA or OBIS or GBIF) they are for; and which vocab link each term has (MIXS, DwCA, etc). We can turn that into a CSV file that can be turned into something better by the rest of the OpSci team, to use in GitHub.

isanti commented 2 years ago

Here is the list of terms with explanations that I created a while back: https://docs.google.com/spreadsheets/d/1zRlazR0jx15SfmczgxZ6v4MxUUUuxCxS/edit#gid=394038639 Most terms come from the MIxS list. We don't have to go through the MiXS list again since all the other lists (including the list above) derive from this (MIxS). There are some additional terms that I created but we should look at those again.

For the ENA checklists, all the terms I have seen come from the MIxS list and they haven't updated them for a long while. The mandatory terms for ENA are very few. We can go beyond that and include a lot more when we submit to ENA. We should pay special attention to the ARMS relevant terms because 1) they are not in ENA at all and 2) I also didn't have it in mind while creating the list above. However, the DNA terms for ARMS should be the same as the ones for sediment and seawater. It is the sampling and handling part that is different.

isanti commented 2 years ago

Ioulia will go through the DwC DNA, Occurence, Measurement, Event and check 1) if there are extra terms we might need, 2) map in which spreadsheet the metadata is, 3) if it has a different term name.

isanti commented 2 years ago

Ioulia will go through the DwC DNA, Occurence, Measurement, Event and check 1) if there are extra terms we might need, 2) map in which spreadsheet the metadata is, 3) if it has a different term name.