AtlasOfLivingAustralia / la-pipelines

Living Atlas Pipelines extensions
3 stars 4 forks source link

Issues and questions with assertions #517

Open RobinaSanderson opened 3 years ago

RobinaSanderson commented 3 years ago

The review of the wiki pages documenting assertions identified some problems with the assertions as currently implemented, some things that would improve the assertions and some questions about assertions. I've listed these all in this issue as an epic. The review is documented at: https://docs.google.com/spreadsheets/d/1-DC_8rEmW0dLYDqmfIdyC8TD4smB3tIUFEmKVnHDdfg/edit#gid=820855861

Sub-issues or tasks could be generated as a decision is taken on what should be done and how to work on the assertions.

Bugs or not implemented The following were identified as bugs or not implemented in the pipelines code.

Improvements These are all improvements as the assertions currently do what is described but could work better.

Questions These were raised as questions in the subject matter review, and will need Dev investigation or a policy decision.

Mesibov commented 2 years ago

Did this review of assertions include checks for false positives and false negatives? If so, how were they done?

RobinaSanderson commented 2 years ago

Hi @Mesibov - To answer your question on how checks were done, I reviewed the wiki pages explaining assertions and checked if that made sense against the records I could find with the assertion, I did not the code of the assertions themselves. I also asked one of the developers if he could check whether an assertion was in use. From the information in the wiki pages I had some questions about how the assertions were working and those are documented above.

You will see in some of the cases above I have asked if the assertion being raised is correct, e.g. COORDINATE_UNCERTAINTY_METERS_INVALID appears to be incorrectly flagging some records. See also GEOREFERENCE_POST_OCCURRENCE.

Mesibov commented 2 years ago

@RobinaSanderson, many thanks for your quick reply. So in checking records with assertions you found some false positives, i.e. incorrect flagging of OK records. Did you also check for false negatives, i.e. records that should have been flagged, but weren't? Some years back I took a one-taxon sample of ALA specimen records (from collections) and systematically checked all populated fields to see if ALA's checking code was picking up the problems the code was designed to highlight, i.e. identifying both true and false positives and negatives. There were quite a few false + and -, indicating "looseness" in the code. I wondered whether the checking code had since been improved.

Mesibov commented 2 years ago

@RobinaSanderson, I wrote up most of those results as GitHub issues: https://github.com/AtlasOfLivingAustralia/biocache-store/issues/100 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/101 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/102 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/103 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/104 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/105 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/106 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/107 https://github.com/AtlasOfLivingAustralia/biocache-store/issues/108

Mesibov commented 2 years ago

@RobinaSanderson, please see also https://github.com/AtlasOfLivingAustralia/biocache-store/issues/393

RobinaSanderson commented 2 years ago

Thanks @Mesibov

Mesibov commented 2 years ago

@RobinaSanderson. So, did you also check for false negatives, i.e. records that should have been flagged, but weren't?