AtlasOfLivingAustralia / DataQuality

Data Quality
0 stars 0 forks source link

Improve data #39

Open M-Nicholls opened 4 years ago

M-Nicholls commented 4 years ago

As a: Data user I want: to improve the quality of data So that: the best data available for access and analysis and the work does not have to be re-done every time someone access it

Tools for user curation and annotation additional processed or derived fields process to remove data data load, process, sample pipeline improvements

Tools for user curation and annotation, feedback loops and updates in response to annotations - https://github.com/AtlasOfLivingAustralia/DataQuality/issues/106

M-Nicholls commented 3 years ago

from @elywallis

Species in question is: Anthochaera phrygia (the Regent Honeyeater) Query in Biocache: https://biocache.ala.org.au/occurrence/search?q=lsid%3Aurn%3Alsid%3Abiodiversity.org.au%3Aafd.taxon%3A83225d29-264f-4236-9dda-32f7d60fb3af&qualityProfile=ALA#tab_mapView with ALA General profile enabled

Issue 2. DQ filter should have picked up records as being suspect and having a bulk annotation tool would be very helpful • click on the dot in the ocean south of Vic and west of Tassie • 29 occurrences are listed against this dot • if you view just these records filters show that these 29 records are out of a possible 6,982 records with 1,015 records flagged as duplicates excluded, and 268 records excluded based on Location Uncertainty. That still doesn't get to 6,982 records so my guess is that many of the records flagged as duplicates are themselves flagged as duplicates as well. • all the records are from a single data resource • my question is why aren't all those records being flagged as spatially suspect by the algorithm? • and my second question is - no-one is going to go in and individually flag 29 records let alone 6,900+ Can this record be added to the argument for why a bulk annotation feature is needed, particularly if the records are not going to be caught by an automated detection.