Open LordFlashmeow opened 3 years ago
When the name breaks, then the OTU#name must be populated. In this example Camponotus mgo1
would be the OTU name. If this is not the case in current import then its a top candidate for 0.20.1 rather than a blocker I think.
@LocoDelAssembly @LordFlashmeow can this be closed?
This is the current mapping we have for it:
ident_qualifier = get_field_value(:identificationQualifier)
if ident_qualifier =~ /^cf[\.\s]/
otu_names << ident_qualifier
else
otu_names << "#{get_field_value(:scientificName)} #{ident_qualifier}"
end unless ident_qualifier.nil?
names.last&.merge!({otu_attributes: {name: otu_names.join(' ')}}) unless otu_names.empty?
Probably. I'll reopen if I encounter the issue on the next big import.
The way this was implemented conflicts with Restrict to existing nomenclature
feature. For instance if you have a scientific name that is invalid like Jivarus ali3nus, it is matched with Jivarus protonym, and Otu.name
is set to Jivarus ali3nus
. Desired result would be to FAIL, and even with restriction disabled is still questionable that the importer accepts the scientificName. We discovered this problem while importing datasets with some bogus scientific names into a private copy of OSF (fortunately was solvable by deleting the OTUs and associated data with a script in rails console).
Biodiversity::Parser.parse("Jivarus ali3nus")
=>
{:parsed=>true,
:quality=>4,
:qualityWarnings=>[{:quality=>4, :warning=>"Unparsed tail"}],
:verbatim=>"Jivarus ali3nus",
:normalized=>"Jivarus",
:canonical=>{:stemmed=>"Jivarus", :simple=>"Jivarus", :full=>"Jivarus"},
:cardinality=>1,
:tail=>" ali3nus",
:details=>{:uninomial=>{:uninomial=>"Jivarus"}},
:words=>[{:verbatim=>"Jivarus", :normalized=>"Jivarus", :wordType=>"UNINOMIAL", :start=>0, :end=>7}],
:id=>"5b4c5fe6-8c4c-5f5d-9238-ad9c242d5560",
:parserVersion=>"GNparser v1.9.1"}
Do your DwC datasets use identificationQualifier
so that we can restrict what the "unparsed tail" in the name parser can be considered valid?
cc @LordFlashmeow @bpescador @AntWeb-org @mjy @mabecabrera
Problematic line of code: https://github.com/SpeciesFileGroup/taxonworks/commit/d922b69d9a571790fd362aeb182361998d5f8c57#diff-49f1423594fe8c44666b568f77142aaead2f6a2796b0e4458895bd8e62e3755eR793 (793 if anchors fails)
In Antweb data, any name with a non alpha characters is a morphotaxon (OTU in TaxonWork speak). Non alpha characters are restricted to numbers and "-"
That's OK, but do you also put the non alpha characters in identificationQualifier
when using the importer? I see that in ant_formicidae
dataset you do, and in fact you don't place the non alpha words in scientificName
. Are you always doing it like this? If so I could revert back to stricter quality checking of parsed names in scientificName
(which would easily solve the conflict problem), and use identificationQualifier
to compose the Otu.name
.
I think we tried to follow the GBIF DwC guidelines - let me know if you think we did it the wrong way. I find the DwC approach to OTU names unnecessaryly confusing.
On Thu, Jul 4, 2024 at 4:49 PM Hernán Lucas Pereira < @.***> wrote:
That's OK, but do you also put the non alpha characters in identificationQualifier when using the importer? I see that in ant_formicidae dataset you do, and in fact you don't place the non alpha words in scientificName. Are you always doing it like this? If so I could revert back to stricter quality checking of parsed names in scientificName (which would easily solve the conflict problem), and use identificationQualifier to compose the Otu.name.
— Reply to this email directly, view it on GitHub https://github.com/SpeciesFileGroup/taxonworks/issues/2430#issuecomment-2209047610, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP4NHA5QMKNOMAHEFJBEOTZKVHE7AVCNFSM6AAAAABKKLGBIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGA2DONRRGA . You are receiving this because you were mentioned.Message ID: @.***>
Found this discussion: https://github.com/tdwg/dwc-qa/issues/162
I believe you've done right, and if you continue using identificationQualifier
to place the morphospecies part of the scientific name I can just make scientificName
parser strict again. Also, we may consider https://dwc.tdwg.org/terms/#dwc:verbatimIdentification (currently not mapped in importer) which was referenced in above issue and discussed at https://github.com/tdwg/dwc/issues/181
Maybe when verbatimIdentification
is present, use it for Otu.name
instead of scientificName
+ identificationQualifier
? (Still leaving Otu.name
blank when only scientificName
is provided)
This issue was discussed here: https://github.com/SpeciesFileGroup/antweb-staging/issues/8 but there was no resolution.
What are the correct fields for a scientific name like
Camponotus mgo1
(wheremgo1
is the IdentificationQualifier)? The current scientificname parser breaks for names like this.