AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

Assertions not working in Biocache Store #393

Open javier-molina opened 4 years ago

javier-molina commented 4 years ago

This is a placeholder to document data assertions that either don't work or do not correctly according to its intended purpose.

Full list of assertions, and decisions on what to do about them is kept in Confluence page of the same name.

Mesibov commented 3 years ago

I did a series of assertions tests in December 2015 on a large sample dataset. Results and problems were reported in biocache-store #100, #101, #102, #103, #104, #105, #106, #107, #108. I would do a check like this very differently today and on a much larger dataset, but note that the problems in those 9 issues are all still open 5 years later (and like most of my ALA GitHub issues postings, have never been labelled, assigned or addressed).

Two different approaches to "assertions not working" are illustrated here in #393 (and on the Confluence page) and in my 2015 effort. One is to pick up "not workings" opportunistically. The other is to systematically analyse whether or not assertions work as intended, which sounds to me like quality control in ALA's data processing, and which does not seem to have been an ALA priority.

A related question - for which a GitHub issues page is not the appropriate place - is which (if any) of ALA's assertions have any value. Has ALA ever systematically examined whether and how data providers respond to assertions in their records? How confident can end-users be in the 2020 "data quality" filtering initiative, that the exclusions are validly "bad" and the inclusions validly "good"?