Generate an identification extension to track changes in taxonomic assignment

gbif / edna-tool-ui

Frontend for the eDNA tool

2 stars 1 forks source link

Generate an identification extension to track changes in taxonomic assignment #120

Open CecSve opened 1 month ago

CecSve commented 1 month ago

Tool users supply a taxonomy file when the data is processed by the tool to generate a dwc-a. Ideally, the scientificName is either a BIN, SH etc. and it is possible to include Linnean ranks with further taxonomic identification.

Would it make sense to support a verbatimIdentification and perhaps an identificationRemarks field to the generated archive, where the original identification (maybe already used in scientific publications) can be added? Maybe more fields would be relevant and could be packaged as an extension file, although the fields mentioned could also just be added to the occurrence core file.

It could allow data users to track the changes in taxonomic identification.

thomasstjerne commented 1 month ago

Actually, verbatimIdentification and identificationRemarks are both in the default list of fields listed in the taxonomy mapping. People use the fields in slightly different ways, but verbatimIdentification is often for the full taxonomy string retrieved from your blasting or whatever assigment tool you use e.g. k__Stramenopila;p__Ochrophyta;c__Phaeophyceae;o__Fucales;f__Sargassaceae;g__Sargassum;s__Sargassum_sp

Also, any field in Occurrence Core or DNA Derived data can be added by a user even though they are not in the default list.

CecSve commented 1 month ago

Oh great - that makes sense. I was just wondering if the tool should automatically fill the verbatimIdentification field based on the input from the publisher? It could be used as the original identification to track changes.

CecSve commented 1 month ago

And the identificationRemarks could include information about the values and refDB if users opt to use the seqID tool to assign taxonomy, for example:

bitScore: 111 | expectValue: 4.03e-24 | queryCoverage: 100 | matchType: BLAST_EXACT_MATCH | queried against a 99% clustered version of the BOLD Public Database v2024-01-06 public data (COI-5P sequences)

thomasstjerne commented 1 month ago

And the identificationRemarks could include information about the values and refDB if users opt to use the seqID tool to assign taxonomy, for example:

bitScore: 111 | expectValue: 4.03e-24 | queryCoverage: 100 | matchType: BLAST_EXACT_MATCH | queried against a 99% clustered version of the BOLD Public Database v2024-01-06 public data (COI-5P sequences)

Yes - eaxactly