Open M-Nicholls opened 3 years ago
what to do with generalised records how to take record uncertainty into account
use the size of the distribution to determine how much the uncertainty or generalisation matters? i.e. for a very small distribution uncertainty and generalisation will make a big difference as to whether the point is in or out should records be considered in or out if it's uncertainty puts it in the range but the point is outside the range?
indicate the point is in/out but based on the uncertainty the record may be out/in
categories - within expected distribution - point and full uncertainty are within the range likely within expected distribution - point within the range uncertainty is out may be within expected distribution - point outside the range and uncertainty overlaps the range outside expected distribution - point outside the range and uncertainty outside the range
use of categories and distance outside distribution provides a through combination of metrics
Add to data pre-filters update assertion metadata update support material
what to do if there are multiple overlapping layers - e.g. likely | maybe layers and separate east coast/west coats layers e.g. grey nurse shark
what to do if there are multiple overlapping layers - e.g. likely | maybe layers and separate east coast/west coats layers e.g. grey nurse shark
Single layer / multi layers won't affect the calculation of in/out of layers, but it brings difficulty in calculating distance
Solution: Jenkins schedules to run the program once every day.
For every run: Pipelines loads all indexed records Comparing with the existing outlier records, filter the new added records Calculate outliers of those new records ONLY.
If a new expert layer is added or updated, manually deleted exisiting outlier records, then Pipelines will recalculated all index records
Where should this occur - part of the pipelines or a separate process?
check layers are available outlier detection
run expert distribution outlier detection - is there an expert distribution for the species, if so detect if a species occurrence record point is in/out of the expert distribution
add a distance of the point inside/outside expected distribution field to the record
add expert distribution outlier category (compare the distance inside/outside the distribution boundary to the uncertainty)
Two scenarios:
Link to pipeline issue: https://github.com/gbif/pipelines/issues/622 Link to Spatial issue: https://github.com/AtlasOfLivingAustralia/spatial-service/issues/186