NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
2 stars 0 forks source link

Protocol for data quality assurance checks #507

Open PipBrewer opened 5 months ago

PipBrewer commented 5 months ago

We need to develop and write a protocol for quality assurance checks on the data in Specify. This involves selecting a minimum of 5 Specify records per collection for which DaSSCo is generating data via mass digitisation every month and checking the Specify records against the digiapp imports and the images. We should use an automated number generator to select the specific records we should check. We should also keep a record of how many records from each collection we have checked when and the number and type of issues we find. Please update the following document with the protocol: Protocol for data quality assurance checks in Specify.docx which is located: N:\SCI-SNM-DigitalCollections\DaSSCo\Admin and project management\Data tasks

RebekkaML commented 2 months ago

Since Pip asked for any thoughts on this topic:

We also need to make sure the images exist and can be found, I suspect that a lot are still stuck in all kinds of error folders.

Some barcodes might not even be in Specify. There is no system to check if we actually scanned each barcode at the herbarium (at the pinned insects station we barcode and image specimens in one go, here we can check that we have the same number of barcodes and images in the end). But at the Herbarium, I suspect that there are some barcodes that weren’t scanned. I found one such case when working on the author sheet, and I also noticed during importing that there are sometimes gaps of 1 barcode in the datasets.

Other issues we should look out for are MSO / MOS and if they are connected and barcoded correctly, and that the taxonomic information is correct, since this had a lot of issues during importing (especially correctly recognizing Hybrids, subspecies, variety etc.) and I’m sure that we missed some of these cases.