This needs reprocessing of the datasets but the biggest datasets like ebird and artportalen can be skipped because they don't have material sample or material citation records. The cache doesn't have to be truncated.
This query returns the datasets that can be skipped:
from occurrence occ select occ.datasetkey, count(*) num_records where occ.basisofrecord NOT IN('MATERIAL_SAMPLE', 'MATERIAL_CITATION')
group by occ.datasetkey
order by num_records desc
It's aproximately 23K datasets to process and 40K to exclude.
But if it's easier we can just skip the biggest ones:
This needs reprocessing of the datasets but the biggest datasets like ebird and artportalen can be skipped because they don't have material sample or material citation records. The cache doesn't have to be truncated.
This query returns the datasets that can be skipped:
It's aproximately 23K datasets to process and 40K to exclude.
But if it's easier we can just skip the biggest ones: