Duplicate values in lu_sgcnXpu table due to 'Unknown' and 'Likely' values added from previous table

In working to help update the modeled distribution data, I came across a minor error in the COA tool scripts, which has duplicated a subset of the tool data. Here’s a summary of what’s going on:

So we assign some of the occurrence data ‘unknown’ or ‘low’ if they are older records or of low accuracy. This happens in the Biotics processing script (see here for an example) as well as a few others like eBird. This was a later change, as we used to assign everything as a ‘known’ value.
To save processing time, we extract the ‘low’ and ‘unknown’ values from the lu_puXsgcn table of the previous update and load them into the new update. See here https://github.com/PNHP/COA_Tools/blob/master/scripts/11_SGCNbyPU.r around line 35.
This was fine when it was only the modeled data, but once we started assigning the ‘low’ and ‘unknown’ values to occurrence data, it start to duplicate data for the occurrence based data. There’s about eight copies of the ‘low’ and ‘unknown’ occurrence data in there built up from the previous updates.
It’s not a major issue at the moment, other the record count is higher than it should be.

To fix this, I would propose adding a field to the lu_sgcn table, that says if a model is being used in the tool. This can be used to filter out the occurrence-based data and only keep the modeled data.

PNHP / COA_Tools

Duplicate values in lu_sgcnXpu table due to 'Unknown' and 'Likely' values added from previous table #89