In working to help update the modeled distribution data, I came across a minor error in the COA tool scripts, which has duplicated a subset of the tool data. Here’s a summary of what’s going on:
So we assign some of the occurrence data ‘unknown’ or ‘low’ if they are older records or of low accuracy. This happens in the Biotics processing script (see here for an example) as well as a few others like eBird. This was a later change, as we used to assign everything as a ‘known’ value.
This was fine when it was only the modeled data, but once we started assigning the ‘low’ and ‘unknown’ values to occurrence data, it start to duplicate data for the occurrence based data. There’s about eight copies of the ‘low’ and ‘unknown’ occurrence data in there built up from the previous updates.
It’s not a major issue at the moment, other the record count is higher than it should be.
To fix this, I would propose adding a field to the lu_sgcn table, that says if a model is being used in the tool. This can be used to filter out the occurrence-based data and only keep the modeled data.
In working to help update the modeled distribution data, I came across a minor error in the COA tool scripts, which has duplicated a subset of the tool data. Here’s a summary of what’s going on:
To fix this, I would propose adding a field to the lu_sgcn table, that says if a model is being used in the tool. This can be used to filter out the occurrence-based data and only keep the modeled data.