BiologicalRecordsCentre / record-cleaner-rules

NBN RecordCleaner rules used for automated species verification
0 stars 0 forks source link

Questions arising from phenology stage termlists #10

Open kitenetter opened 2 months ago

kitenetter commented 2 months ago

Csv files for the "mature" stage termlists are in a folder structure under: W:\PYWELL_SHARED\Pywell Projects\BRC\_BRC_projects\NCEA work 2023-24\Record cleaner redevelopment\stage terms

Questions for our next review meeting:

  1. [question over folder names, answered by Robin]
  2. Rename Larger Brachycera RS to Soldierflies and Allies Recording Scheme
  3. Rename Orthopteroids as Grasshoppers and Allies?
  4. Should we include “not recorded” in the list of synonyms for “mature”? At the moment I have done so.
  5. Are there any options for dealing with records that may have multiple custom attributes that refer to stages, e.g. for dragonflies (Ad:1|Co:0|Em:0|Ex:0|La:0|Ov:0) and herpetofauna (adult:1|spawn/egg:1300+|tadpoles/larvae:500+)? This will be an issue for applying rules to Indicia, but may not be an issue for the rule cleaner web service.
  6. Do we have any options for dealing with stage/abundance combined, e.g. lots from Birdtrack have “flowering:230”, “flowering:5” etc. etc.
  7. Zero abundance records – can they be excluded?
  8. Are there any options for highlighting inappropriate stage terms, e.g. “flowering” butterflies?
  9. Can we flag records with unrecognised stages as “no checks available for this stage term”?
  10. What if a scheme wants to apply one set of rules to all stages, can we do that or would we need to compile a termlist csv that includes all conceivable stage terms as "mature"?

@robin-hutchinson we probably need to resolve some of the above with Jim before transferring the csv files into github.

kitenetter commented 1 month ago

Answers: 2 & 3: go ahead with re-naming

4: fine to include "not recorded" as a synonym, so no change needed to the current files.

5 & 6: we can't cater for all special cases within record cleaner, so this will be the responsibility of the users to format their data in an appropriate way (we can advise on this in the guidance)

7: up the users to avoid trying to check zero abundance records as this is not meaningful.

8 & 9: these should be correctly handled within the system - such records would fail to match a rule for phenology and messages would be generated.

10: this can be catered for by using an asterisk wildcard in the "stage" column (in place of "mature") in the periodwithinyear.csv files - will add a new issue for this.

We agreed that it is best not to add stage_synonym.csv files to github for rules that don't need them. @robin-hutchinson this means that you can ignore the csv files the in server subfolder "not needed_no phenology rules".

The stage_synonym.csv files can now be copied into github, and once that is done this issue can be closed.

robin-hutchinson commented 1 month ago

Grasshoppers and Soldierflies renamed, stage_synonym files added!