BiologicalRecordsCentre / record-cleaner-service

Service for checking species records against the record cleaner rules
MIT License
0 stars 0 forks source link

Implement record cleaner rule checking #6

Open JimBacon opened 4 months ago

JimBacon commented 4 months ago

The service will have to implement the following validation rules

The service will have to implement the following verification rules

The service should implement the following verification rules for compatibility with the existing record cleaner

All the verification rules can reversed to perform the inverse function.

JimBacon commented 1 month ago

To date records have only been flagged when they fail a rule check. It has not been possible to distinguish between records which have passed tests and records for which no tests exist. @kitenetter, I recall you expressing a desire for this distinction to be made. How would you like this to be done?

We can anticipate that only a few species will have period and ancillary rules so an absence of these rules is not remarkable. Period within year (phenology) and Without polygon (ten km distribution) rules are more likely to be created for everything.

As a suggestion, we could change from pass/fail to pass/warn/fail as an overall result where 'warn' indicates either the phenology or tenkm rules are missing.

An accompanying list of messages can detail the reason for a warn or fail.

What do you think?

kitenetter commented 1 month ago

From our discussion today, here is a proposed way forward:

  1. We work on the basis that for any taxon group, the list of taxa included is defined by the list of taxa that have ID difficulty rules.
  2. When importing data into record cleaner, we run a validation check to find out of the imported data contains taxa for which no ID difficulty rules exist.
  3. If no ID difficulty rule exists, the record is flagged as "No rule checks available" or similar.
  4. If an ID difficulty rule does exist, the record is taken on to the next stage of checking against all rules that are available for that taxon.
  5. Our guidance to rule creators should make it clear that ID difficulty rules need to exist for all taxa of interest.