Closed cymon closed 1 month ago
what would you like to see wrt documentation? And while I have your attention, we want to add the run accession numbers to that same file, once they exist I suggest one column for the run accession number per gene type? If so, can you remind me what the "gene types" are that we are producing? metagenomics is one "gene type" for the metabarcoding we have 18S, COI, ITS (?) for the ARMS and ?? for Wa and So?
What kind of documentation? Do you need something other than what is included here?
@kmexter We have the metagenomics and we also have the metabarcoding for the 18S and the metabarcoding for the COI. No ITS.
what would you like to see wrt documentation?
Christina just supplied this information: ena_accession_number_sample: ENA Accession Number of sequence data biosamples_accession_number: BioSamples Accession Number ena_accession_number_project: Observatory ENA Project Accession Number ena_accession_number_umbrella: EMO BON Project Accession Number
And while I have your attention, we want to add the run accession numbers to that same file What is the "run accession number" field? In which table does it occur?
The table as if is currently defined has the "ref_code" that is the unique identifier in the "run-information-batch-001.csv" and the "source_material_id" the (not, but should be)unique identifier linking it to the Google logsheets. I dont think it needs anything else, or rather adding another identifier would be redundant.
, once they exist I suggest one column for the run accession number per gene type? If so, can you remind me what the "gene types" are that we are producing? metagenomics is one "gene type" for the metabarcoding we have 18S, COI, ITS (?) for the ARMS and ?? for Wa and So?
As you note this is metagenomics, not metabarcoding. So there are only on "type" of sequence in metagenomics, and they are not genes - they are just sequence reads.
What kind of documentation? Do you need something other than what is included here?
Nope that's what I was looking for...
I would make some corrections, if we want to be 1-1 with the ENA definitions
biosamples_accession_number: Sample Accession Number / Biosample Accession Number (example) ena_accession_number_sample: Secondary Sample Accession Number (example) --> this is like an umbrella sample accession in ENA ena_accession_number_project: Study Accession Number / Project (example) --> In ENA, project == study --> For EMO BON, there will be one project per observatory --> Each project is a component project under the EMO BON Umbrella Study, else known as the Parent Project ena_accession_number_umbrella: EMO BON Umbrella Study Number
OK, so @cymon what would you like to see that is different to what we have now? Update slightly (following Christinas suggestions) the descriptions in that table, or something in a README.md?
Either, both, or none... I'm good.
Just wanted to know the definition of the fields. The naming of the file "run-information-batch-001_column-descriptions.csv" is misleading as it also contains descriptions of the fields in the "ena-accession-numbers-batch-001.csv file, which I hadt realised. But no big deal.
I've closed this issue above...
Could we have some brief documentation for the different accession numbers in this table?
shipment/batch-001/ena-accession-numbers-batch-001.csv