Open jiho opened 2 years ago
Also tagging @rubenpp7
Update: To indicate the status of the id, in the DarwinCore field identificationVerificationStatus: in EurOBIS we will not use "Dubious according to human", only: "Predicted by machine" and "Verified by human"
Indeed, as of today, what is not Verified by human
is just filtered out. I guess that the present issue needs to be exposed to users (via API). E.g. do we want to do it always or as a choice? Are there variations in such choice?
Code browsing:
Doc browsing:
An example with mix of Predicted and Validated occurrences. The corresponding Emofs distinguish the 2 different occurrences inside the same sample.
- it looks like identifiedBy field is needed for validated images. I guess it's all people involved in identification of any object in this taxon. Could be quite long.
We decide to only mention the latest validator, who has the authority on the validation. This field is therefore used to "know who to blame" 😉 Previous validators will be "thanked" through the co-authorship of the dataset.
Since one occurence corresponds to one or more objects in EcoTaxa, this should be the concatenated list of all validators (separated by | )
- For not-validated images, identificationReferences has to contain, I guess, some information on the ML used for automatic classification.
When validated, this should be a paper/book. For us it would be the future EcoTaxoGuide. Storing this for each object seems like a waste of bits.
When predicted, the best practices document mentions that it should be a reference to the model. We don't store those and even if we did, they would not guarantee reproducibility.
=> We do not use this field for the moment.
- associatedMedia is optional but can be filled in for EcoTaxa (url to project+sample)
Giving the links to all images is not realistic. Giving the link to the project is (i) not guaranteed to work forever, (ii) redundant with the link back to EcoTaxa at the level of the whole dataset.
=> We do not use this field for the moment.
Currently, we export only validated objects in DWCA (@grololo06, can you confirm?)
A proposal is underway (by @PatriciaCabrera) to use the DarwinCore field
identificationVerificationStatus
to indicate the status : "Verified by human", "Dubious according to human", "Predicted by machine".This maps directly to the statuses in EcoTaxa. 🥳
But an occurrence in the
occurrence.txt
file of a DWCA (i.e. a line) can only have oneidentificationVerificationStatus
; this means that, to use this field, the abundances/concentrations/biovolumes would need to be summed by sample + taxon + status; then for a taxon that has objects of the three statuses, there would be three lines inoccurrences.txt
and 3 lines inemof.txt
, each the with concentration corresponding to the objects with the given status. Then it would be the responsibility fo the user of the data to decide if he/she wants to sum all three (and risk mistakes), keep only the validated (and risk underestimating concentration), etc.