Closed Adafede closed 5 months ago
hm, I see what you mean. In darwin core file taxonomicStatus
is accepted
for 7269282 and doubtful
for 10900358. I do not collect taxonomicStatus verbatim, mostly because it the wild people put various things into this field, so it is not trivial to figure out what do they mean when using a script. For example doubtful
is not part of the recommended list of values for the term:
https://dwc.tdwg.org/terms/#dwc:taxonomicStatus
And I just understood, that I do not provide an outlink URL for GBIF, my fault :facepalm:, adding it now.
GBIF's outlink works now
{
"id": "4431a0f3-e901-519a-886f-9b97e0c99d8e",
"name": "Bubo bubo",
"cardinality": 2,
"matchType": "Exact",
"bestResult": {
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "5959092",
"outlink": "https://gbif.org/species/5959092",
"entryDate": "2022-06-10",
"sortScore": 9.41356052807642,
"matchedName": "Bubo bubo (Linnaeus, 1758)",
"matchedCardinality": 2,
"matchedCanonicalSimple": "Bubo bubo",
"matchedCanonicalFull": "Bubo bubo",
"currentRecordId": "5959092",
"currentName": "Bubo bubo (Linnaeus, 1758)",
"currentCardinality": 2,
"currentCanonicalSimple": "Bubo bubo",
"currentCanonicalFull": "Bubo bubo",
"isSynonym": false,
"classificationPath": "Animalia|Chordata|Aves|Strigiformes|Strigidae|Bubo|Bubo bubo",
"classificationRanks": "kingdom|phylum|class|order|family|genus|species",
"classificationIds": "1|44|212|1450|9348|5959091|5959092",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 1,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
},
"dataSourcesNum": 1,
"dataSourcesIds": [
11
],
"curation": "AutoCurated"
}
🙌🏼
I guess one possible way to deal with this, is to use "accepted" as 1 and everything else as 0, and use it as one of the lower priorities. I'll keep the ticket open, so I remember to get to it at some point.
Seems like a good way to go indeed!
https://github.com/gnames/gnverifier/issues/113 added to try to make sense of TaxonomicStatus field in DwCA files
Hi @dimus!
When running:
Obtained result is:
The two first results are preferred as they are
"isSynonym": false
, which is very good, but it would be ideal to prefer https://www.gbif.org/species/7269282 over https://www.gbif.org/species/10900358 as the latter isdoubtful
(even if it has"parsingQualityScore": 0
).Is there an easy way to achieve that? Does it make sense?
Best,