Open nickdos opened 2 years ago
Before stats - https://nectar-arga-dev-1.ala.org.au/api/select?q=*:*&facet=true&facet.field=matchType&rows=0
{
"matchType": [
"exactMatch",1018593,
"higherMatch",326999,
"canonicalMatch",41855,
"fuzzyMatch",302,
"phraseMatch",21,
"taxonIdMatch",4
]
}
Attempted to load into names index via merge but it ran out of memory on my machine. See https://github.com/AtlasOfLivingAustralia/ala-name-matching/issues/162.
GBIF provide the individual name sources via the https://www.checklistbank.org/dataset/2169/download download tool. So I'm attempting to merge in the NCBI DwCA from there, as a first try.
https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c
Download the DwCA version and create a names matching index, re-run pipelines processing and compare the counts for the
match-type
field before and after. To assess whether the GBIF backbone is a better source than the ALA one.GBIF's Checklist Bank site (https://www.checklistbank.org/) allows individual taxonomy datasets to be downloaded as DwCA files, so there is a possibility of picking the sources we need and using these over the complete (huge) GBIF taxonomy.