gnames / gnverifier

GNverifier verifies scientific names against more than 100 biodiversity databases
https://verifier.globalnames.org
MIT License
19 stars 1 forks source link

doubtful entries in GBIF #94

Closed Adafede closed 5 months ago

Adafede commented 2 years ago

Hi @dimus!

When running:

gnverifier -V

version: v1.0.0-RC1
build: n/a
echo Zanthoxylum piperitum  | gnverifier -s 11 -M -f pretty

Obtained result is:

{
  "id": "cc9045da-9d6b-593b-8103-50ff33f5b225",
  "name": "Zanthoxylum piperitum",
  "cardinality": 2,
  "matchType": "Exact",
  "results": [
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "10900358",
      "entryDate": "2022-06-10",
      "sortScore": 9.41356052807642,
      "matchedName": "Zanthoxylum piperitum Linnaeus, 1759",
      "matchedCardinality": 2,
      "matchedCanonicalSimple": "Zanthoxylum piperitum",
      "matchedCanonicalFull": "Zanthoxylum piperitum",
      "currentRecordId": "10900358",
      "currentName": "Zanthoxylum piperitum Linnaeus, 1759",
      "currentCardinality": 2,
      "currentCanonicalSimple": "Zanthoxylum piperitum",
      "currentCanonicalFull": "Zanthoxylum piperitum",
      "isSynonym": false,
      "classificationPath": "Plantae|Tracheophyta|Magnoliopsida|Sapindales|Rutaceae|Zanthoxylum|Zanthoxylum piperitum",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "6|7707728|220|933|2396|3190076|10900358",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 1,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 1,
        "parsingQualityScore": 1
      }
    },
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "7269282",
      "entryDate": "2022-06-10",
      "sortScore": 9.413296866539289,
      "matchedName": "Zanthoxylum piperitum (L.) DC. DC. (L.)",
      "matchedCardinality": 2,
      "matchedCanonicalSimple": "Zanthoxylum piperitum",
      "matchedCanonicalFull": "Zanthoxylum piperitum",
      "currentRecordId": "7269282",
      "currentName": "Zanthoxylum piperitum (L.) DC. DC. (L.)",
      "currentCardinality": 2,
      "currentCanonicalSimple": "Zanthoxylum piperitum",
      "currentCanonicalFull": "Zanthoxylum piperitum",
      "isSynonym": false,
      "classificationPath": "Plantae|Tracheophyta|Magnoliopsida|Sapindales|Rutaceae|Zanthoxylum|Zanthoxylum piperitum",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "6|7707728|220|933|2396|3190076|7269282",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 1,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 1,
        "parsingQualityScore": 0
      }
    },
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "7687487",
      "entryDate": "2022-06-10",
      "sortScore": 9.41320894377719,
      "matchedName": "Zanthoxylum piperitum Benn.",
      "matchedCardinality": 2,
      "matchedCanonicalSimple": "Zanthoxylum piperitum",
      "matchedCanonicalFull": "Zanthoxylum piperitum",
      "currentRecordId": "3832937",
      "currentName": "Zanthoxylum avicennae (Lam.) DC. DC. (Lam.)",
      "currentCardinality": 2,
      "currentCanonicalSimple": "Zanthoxylum avicennae",
      "currentCanonicalFull": "Zanthoxylum avicennae",
      "isSynonym": true,
      "classificationPath": "Plantae|Tracheophyta|Magnoliopsida|Sapindales|Rutaceae|Zanthoxylum|Zanthoxylum avicennae",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "6|7707728|220|933|2396|3190076|3832937",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 1,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 0,
        "parsingQualityScore": 1
      }
    },
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "8128554",
      "entryDate": "2022-06-10",
      "sortScore": 9.41320894377719,
      "matchedName": "Zanthoxylum piperitum Hook. \u0026 Arn.",
      "matchedCardinality": 2,
      "matchedCanonicalSimple": "Zanthoxylum piperitum",
      "matchedCanonicalFull": "Zanthoxylum piperitum",
      "currentRecordId": "3832851",
      "currentName": "Zanthoxylum beecheyanum K.Koch",
      "currentCardinality": 2,
      "currentCanonicalSimple": "Zanthoxylum beecheyanum",
      "currentCanonicalFull": "Zanthoxylum beecheyanum",
      "isSynonym": true,
      "classificationPath": "Plantae|Tracheophyta|Magnoliopsida|Sapindales|Rutaceae|Zanthoxylum|Zanthoxylum beecheyanum",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "6|7707728|220|933|2396|3190076|3832851",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 1,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 0,
        "parsingQualityScore": 1
      }
    }
  ],
  "dataSourcesNum": 1,
  "dataSourcesIds": [
    11
  ],
  "curation": "AutoCurated"
}

The two first results are preferred as they are "isSynonym": false, which is very good, but it would be ideal to prefer https://www.gbif.org/species/7269282 over https://www.gbif.org/species/10900358 as the latter is doubtful (even if it has "parsingQualityScore": 0).

Is there an easy way to achieve that? Does it make sense?

Best,

dimus commented 2 years ago

hm, I see what you mean. In darwin core file taxonomicStatus is accepted for 7269282 and doubtful for 10900358. I do not collect taxonomicStatus verbatim, mostly because it the wild people put various things into this field, so it is not trivial to figure out what do they mean when using a script. For example doubtful is not part of the recommended list of values for the term:

https://dwc.tdwg.org/terms/#dwc:taxonomicStatus

And I just understood, that I do not provide an outlink URL for GBIF, my fault :facepalm:, adding it now.

dimus commented 2 years ago

GBIF's outlink works now

{
  "id": "4431a0f3-e901-519a-886f-9b97e0c99d8e",
  "name": "Bubo bubo",
  "cardinality": 2,
  "matchType": "Exact",
  "bestResult": {
    "dataSourceId": 11,
    "dataSourceTitleShort": "GBIF Backbone Taxonomy",
    "curation": "AutoCurated",
    "recordId": "5959092",
    "outlink": "https://gbif.org/species/5959092",
    "entryDate": "2022-06-10",
    "sortScore": 9.41356052807642,
    "matchedName": "Bubo bubo (Linnaeus, 1758)",
    "matchedCardinality": 2,
    "matchedCanonicalSimple": "Bubo bubo",
    "matchedCanonicalFull": "Bubo bubo",
    "currentRecordId": "5959092",
    "currentName": "Bubo bubo (Linnaeus, 1758)",
    "currentCardinality": 2,
    "currentCanonicalSimple": "Bubo bubo",
    "currentCanonicalFull": "Bubo bubo",
    "isSynonym": false,
    "classificationPath": "Animalia|Chordata|Aves|Strigiformes|Strigidae|Bubo|Bubo bubo",
    "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
    "classificationIds": "1|44|212|1450|9348|5959091|5959092",
    "editDistance": 0,
    "stemEditDistance": 0,
    "matchType": "Exact",
    "scoreDetails": {
      "cardinalityScore": 1,
      "infraSpecificRankScore": 0,
      "fuzzyLessScore": 1,
      "curatedDataScore": 0.33333334,
      "authorMatchScore": 0.14285715,
      "acceptedNameScore": 1,
      "parsingQualityScore": 1
    }
  },
  "dataSourcesNum": 1,
  "dataSourcesIds": [
    11
  ],
  "curation": "AutoCurated"
}
Adafede commented 2 years ago

🙌🏼

dimus commented 2 years ago

I guess one possible way to deal with this, is to use "accepted" as 1 and everything else as 0, and use it as one of the lower priorities. I'll keep the ticket open, so I remember to get to it at some point.

Adafede commented 2 years ago

Seems like a good way to go indeed!

dimus commented 5 months ago

https://github.com/gnames/gnverifier/issues/113 added to try to make sense of TaxonomicStatus field in DwCA files