AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

Problem with processing of iNaturalist taxonomic entries #347

Open Mesibov opened 4 years ago

Mesibov commented 4 years ago

I downloaded the 638655-record iNaturalist dataset and found 3 fields with full scientific names: "scientificName" (here called 1) "scientificName" (here called 2) "species" (here called 3) If I can believe the headings.csv that came with the records, the first is raw scientificName, the second is ALA-processed scientificName and the 3rd is "The species the Atlas has matched this record to in the NSL".

As expected there are a few records in (1) with (2) and (3) blank. Unexpectedly, there are also many records with (1) blank but with (2) and (3) filled. Looking at just one of these, the taxonomy gets weird: https://biocache.ala.org.au/occurrences/9e4c4726-068c-4860-b142-4ff43ba9fa57

The "original vs processed" tables says that iNaturalist actually did supply a raw ID, namely

Plantae|Tracheophyta|Magnoliopsida|Fabales|Fabaceae|Acacia|Acacia ampliata

Nothing wrong there, but the ALA-processed classification replaces "Tracheophyta" with "Charophyta" (green algae), and "Magnoliopsida" with "Equisetopsida" (horsetails).

I haven't looked at any more of the -/2/3 records. Is this a processing failure? How to explain the phylum and class errors?

djtfmartin commented 4 years ago

Just on the difference in higher classification, ALA's source for Acacia ampliata is here:

https://biodiversity.org.au/nsl/services/rest/node/apni/2919087

which has:

Plantae / Charophyta / Equisetopsida / Magnoliidae / Rosanae / Fabales / Fabaceae / Acacia /Acacia ampliata

Mesibov commented 4 years ago

That explains the errors. Will ALA be querying APC about this? Their hierarchy is defective. My download was https://doi.org/10.26197/5d9c2f72356e8

djtfmartin commented 4 years ago

Thanks @Mesibov. Ive passed that question to the NSL.