globalbioticinteractions / name-alignment-template

align names with known taxonomic resources
https://big-bee-network.github.io/name-alignment-workshop
Creative Commons Zero v1.0 Universal
2 stars 6 forks source link

align names from GBIF download occurrence downloads used in Chesshire, P.R., Fischer, E.E., Dowdy, N.J., Griswold, T.L., Hughes, A.C., Orr, M.C., Ascher, J.S., Guzman, L.M., Hung, K.-L.J., Cobb, N.S. and McCabe, L.M. (2023), Completeness analysis for over 3000 United States bee species identifies persistent data gap. Ecography e06584. https://doi.org/10.1111/ecog.06584 #16

Open jhpoelen opened 1 year ago

jhpoelen commented 1 year ago

from Chesshire, P.R., Fischer, E.E., Dowdy, N.J., Griswold, T.L., Hughes, A.C., Orr, M.C., Ascher, J.S., Guzman, L.M., Hung, K.-L.J., Cobb, N.S. and McCabe, L.M. (2023), Completeness analysis for over 3000 United States bee species identifies persistent data gap. Ecography e06584. https://doi.org/10.1111/ecog.06584

via https://figshare.com/projects/Completeness_analyses_for_over_3000_United_States_bee_species_identifies_persistent_data_gaps/138673

GBIF.org (3 February 2021) GBIF Occurrence Download https://doi.org/10.15468/dl.6cxfsw

GBIF.org (3 February 2021) GBIF Occurrence Download https://doi.org/10.15468/dl.b9rfa7

GBIF.org (3 February 2021) GBIF Occurrence Download https://doi.org/10.15468/dl.w2nndm

jhpoelen commented 1 year ago

see related name alignment workflow configuration at:

https://github.com/jhpoelen/name-alignment-Chesshire-2023 (currently running)

jhpoelen commented 1 year ago

@seltmann @jtmiller28 I made a name alignment configuration for three GBIF download DOIs at https://github.com/jhpoelen/name-alignment-Chesshire-2023. I almost forgot that preston supports resolving these dois to their associated data, so you can plug the download dois directly into the name alignment workflow using:

    - url: https://doi.org/10.15468/dl.b9rfa7
      enabled: true
      type: application/dwca
    - url: https://doi.org/10.15468/dl.6cxfsw
      enabled: true
      type: application/dwca
    - url: https://doi.org/10.15468/dl.w2nndm
      enabled: true
      type: application/dwca

fingers cross to have the workflow complete in time before Github actions cuts off the workflow . . .

jhpoelen commented 1 year ago

Note that all three DOIs are marked for deletion as documented by associated GBIF metadata records:

<https://api.gbif.org/v1/occurrence/download/0182006-200613084148143> <http://purl.org/pav/hasVersion> <hash://sha256/1c5d8a7399793a634a0dde32f3a94ccf64199f010d7f93baa422c2e1dbb98b2f> <urn:uuid:cc39f686-2a6a-459b-b73f-beae32361598> .
<https://api.gbif.org/v1/occurrence/download/0182032-200613084148143> <http://purl.org/pav/hasVersion> <hash://sha256/23d7c875420bea71d24c1ec3ba127f91eff5b368744de14824de0fc4fc090bb2> <urn:uuid:073e2d93-1704-4157-a61a-a05170a115d5> .
<https://api.gbif.org/v1/occurrence/download/0182076-200613084148143> <http://purl.org/pav/hasVersion> <hash://sha256/6555d581e0ce75c77740811e547da726297d02369b149893faf531f132a2aff0> <urn:uuid:e2b784df-c64d-416e-ba86-f24b07135871> .

obtained on 2023-03-31 .

{
  "key": "0182006-200613084148143",
  "doi": "10.15468/dl.6cxfsw",
  "license": "http://creativecommons.org/licenses/by-nc/4.0/legalcode",
  "request": {
    "predicate": {
      "type": "and",
      "predicates": [
        {
          "type": "or",
          "predicates": [
            {
              "type": "equals",
              "key": "BASIS_OF_RECORD",
              "value": "PRESERVED_SPECIMEN",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "BASIS_OF_RECORD",
              "value": "UNKNOWN",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "BASIS_OF_RECORD",
              "value": "HUMAN_OBSERVATION",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "BASIS_OF_RECORD",
              "value": "MATERIAL_SAMPLE",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "BASIS_OF_RECORD",
              "value": "MACHINE_OBSERVATION",
              "matchCase": false
            }
          ]
        },
        {
          "type": "or",
          "predicates": [
            {
              "type": "equals",
              "key": "COUNTRY",
              "value": "US",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "COUNTRY",
              "value": "CA",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "COUNTRY",
              "value": "MX",
              "matchCase": false
            }
          ]
        },
        {
          "type": "or",
          "predicates": [
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "4345",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "4334",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7905",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7901",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7908",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7911",
              "matchCase": false
            }
          ]
        }
      ]
    },
    "sendNotification": true,
    "format": "DWCA",
    "type": "OCCURRENCE",
    "verbatimExtensions": []
  },
  "created": "2021-02-03T17:50:18.533+00:00",
  "modified": "2021-02-03T18:00:50.416+00:00",
  "eraseAfter": "2021-08-03T17:50:18.453+00:00",
  "status": "SUCCEEDED",
  "downloadLink": "https://api.gbif.org/v1/occurrence/download/request/0182006-200613084148143.zip",
  "size": 600597802,
  "totalRecords": 2472496,
  "numberDatasets": 196
}
{
  "key": "0182032-200613084148143",
  "doi": "10.15468/dl.b9rfa7",
  "license": "http://creativecommons.org/licenses/by/4.0/legalcode",
  "request": {
    "predicate": {
      "type": "and",
      "predicates": [
        {
          "type": "or",
          "predicates": [
            {
              "type": "equals",
              "key": "COUNTRY",
              "value": "US",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "COUNTRY",
              "value": "MX",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "COUNTRY",
              "value": "CA",
              "matchCase": false
            }
          ]
        },
        {
          "type": "equals",
          "key": "DATASET_KEY",
          "value": "e4d3fc77-1d94-495b-96ff-3fe8b8f7a3bd",
          "matchCase": false
        },
        {
          "type": "or",
          "predicates": [
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "4334",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7911",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "4345",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7908",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7905",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7901",
              "matchCase": false
            }
          ]
        }
      ]
    },
    "sendNotification": true,
    "format": "DWCA",
    "type": "OCCURRENCE",
    "verbatimExtensions": []
  },
  "created": "2021-02-03T18:21:59.548+00:00",
  "modified": "2021-02-03T18:32:45.439+00:00",
  "eraseAfter": "2021-08-03T18:21:59.474+00:00",
  "status": "SUCCEEDED",
  "downloadLink": "https://api.gbif.org/v1/occurrence/download/request/0182032-200613084148143.zip",
  "size": 47693201,
  "totalRecords": 178715,
  "numberDatasets": 1
}
{
  "key": "0182076-200613084148143",
  "doi": "10.15468/dl.w2nndm",
  "license": "http://creativecommons.org/licenses/by-nc/4.0/legalcode",
  "request": {
    "predicate": {
      "type": "and",
      "predicates": [
        {
          "type": "equals",
          "key": "DATASET_KEY",
          "value": "e05f6e7d-418e-4407-8e0f-7b8ccf21109e",
          "matchCase": false
        },
        {
          "type": "or",
          "predicates": [
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "4334",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "4345",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7911",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7908",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7901",
              "matchCase": false
            },
            {
              "type": "equals",
              "key": "TAXON_KEY",
              "value": "7905",
              "matchCase": false
            }
          ]
        }
      ]
    },
    "sendNotification": true,
    "format": "DWCA",
    "type": "OCCURRENCE",
    "verbatimExtensions": []
  },
  "created": "2021-02-03T19:18:46.687+00:00",
  "modified": "2021-02-03T19:20:03.899+00:00",
  "eraseAfter": "2021-08-03T19:18:46.611+00:00",
  "status": "SUCCEEDED",
  "downloadLink": "https://api.gbif.org/v1/occurrence/download/request/0182076-200613084148143.zip",
  "size": 2624689,
  "totalRecords": 11654,
  "numberDatasets": 1
}