SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
MIT License
87 stars 27 forks source link

Unexpected OTU matching behavior during DwCA import #4075

Open camwebb opened 1 month ago

camwebb commented 1 month ago

I'm trying DwCA Occurrence import into a project with existing names and OTUs and cannot work out how TW is matching dwc:scientificName to TW OTU.

An example for this imported Collection Object:

How can I force a match to a specific OTU? Surely the default behavior should be to match to the OTU with exactly the same spelling? This question relates to #2765 (request for TW:TaxonDetermination:otu_id) and possibly #3921.

camwebb commented 1 month ago

Solved, I think. The OTU match can be forced: if there is an OTU with a NULL name (i.e. no name), for a preferred taxon name (including author string) which exactly matches the imported scientificName then this is the OTU that is applied. If this OTU does not exist, the behavior is less predictable (i.e., I have not yet worked out the rules).

mjy commented 1 month ago

@camwebb

The logic is from the file below (I pointed it to one example of where otu: is used, this isn't the whole picture). We're slowly refactoring that code to isolate key algorithims like "what otu?".

https://github.com/SpeciesFileGroup/taxonworks/blob/development/app/models/dataset_record/darwin_core/occurrence.rb#L288

I'm not sure if you've been in in on that loop, but we've also been writing specs and expectations and will be pointing the help docs direclty to those. If you come up with specific very small examples in your testing feel free to pass those along and we'll add them, the benefit is that all expectations are tested on every pushed commit to GH, and we have examples to point people at.

See https://github.com/SpeciesFileGroup/taxonworks/tree/development/spec/files/import_datasets/occurrences for our mini dwc files, and https://github.com/SpeciesFileGroup/taxonworks/blob/development/spec/models/dataset_record/darwin_core/occurrence_spec.rb for their use in tests.

mjy commented 1 month ago

Pinging @debpaul to flag this as important for docs.

camwebb commented 1 month ago

If you come up with specific very small examples in your testing

This is the test case I have so far: