gbif / edna-tool-ui

Frontend for the eDNA tool
2 stars 1 forks source link

Starting a list of barcoding genes and their synonyms. #93

Open tobiasgf opened 2 months ago

tobiasgf commented 2 months ago

At some point we may want to make a more comprehensive list of barcoding genes and their synonyms (also for use in the sequence ID tool, etc.)

Starting here (with a bit of help from AI):

[edited: RNA to DNA, added "mt" to mitochondial versions]

tobiasgf commented 2 months ago

Consider using rDNA instead of rDNA, reflecting the focus on DNA sequences and not the gene product. [edit, I meant: rDNA instead of rRNA]

tobiasgf commented 2 months ago

Also, consider splitting ITS into: ITS1 and ITS2, and ITS region (ITS1+5.8S+ITS2)

CecSve commented 2 months ago
  • 16S rRNA (mitochondrial) (16S ribosomal RNA): 16SrDNA, 16S

  • 16S rRNA (bacterial) (16S ribosomal RNA)

Does it matter whether it is mitochondrial or bacterial? Concepts have to be unique, so I have added only one.

tobiasgf commented 2 months ago

Two different genes 16S mitochondrial gene is the Large subunit (LSU) (homologue to the 23S in bacteria and 28S in eukaryotes) 16S bacterial gene is the small subunit (SSU) (homologue to the 12S in mitochondria and 28S in eukaryotes) They just (unfortunately) have the same sedimentation coefficient (S-value).

So the unique concepts would rather be LSU and SSU. But they are just targeted with very different primers.

But I believe it is best to keep these two genes (LSU and SSU) separate between bacteria/archaea, eukatyotes and mitochondria.

CecSve commented 2 months ago

What do you propose we call the concepts then so we make sure they do not match?

tobiasgf commented 2 months ago

Also, I think we should use the DNA-terminology (rDNA not rRNA), as we are talkning about the DNA that is being targeted, not the gene product. Maybe:

12S mtDNA 16S mtDNA 16S rDNA 18S rDNA 23S rDNA 28S rDNA

CecSve commented 2 months ago

Could I ask you to update the first post with the list? Then I'll add the concepts to the spreadsheet.

CecSve commented 1 month ago

Also, consider splitting ITS into: ITS1 and ITS2, and ITS region (ITS1+5.8S+ITS2)

I have added 1 and 2 - not sure what is meant with the region. Could you please specify? https://docs.google.com/spreadsheets/d/1_cV4LZeqF_sm-JVaHuKusWBeVK2VkVWEcuPM8MPZN3s/edit#gid=1674909017

CecSve commented 1 month ago

Two different genes 16S mitochondrial gene is the Large subunit (LSU) (homologue to the 23S in bacteria and 28S in eukaryotes) 16S bacterial gene is the small subunit (SSU) (homologue to the 12S in mitochondria and 28S in eukaryotes) They just (unfortunately) have the same sedimentation coefficient (S-value).

So the unique concepts would rather be LSU and SSU. But they are just targeted with very different primers.

But I believe it is best to keep these two genes (LSU and SSU) separate between bacteria/archaea, eukatyotes and mitochondria.

I do not think it is currently captured well as concepts. Ideally, it should be clear for publishers as well as users - but mostly users what goes in the interpreted field. Let me know how you think we should separate the two in a clear way. For example, we have a verbatim value 16S - how should this be interpreted?

CecSve commented 1 month ago

The verbatim values are now ready to be mapped - I have added some mapping already, however, since they all are identical to either the concept, label_en or alternativeLabel_en they will be removed from the mapping sheet since they will be mapped based on this.