Open Gomismis opened 4 weeks ago
In this document you will find list of all our numbers and it I shows:
Which numbers we have used already used.
I have marked the rolls that cotains reserved numbers at the Hebarium, so we won't make the same mistake.
The following barcode numbers are already in use by others in Specify (10.369 records) - need to be redone: cat_numbers_used_twice_since_august.xlsx
These ones were digitized, are in the danger zone, but not used by others, so it's safe to import them (1021 records): reserved_for_dassco_already_digitized_since_august.xlsx
_Not used by others in the danger zone - safe to use the labels (694 records): (don't know it worth to bother) safe_to_digitize_in_danger_zone.csv_
The previous ones (694 records) are not available anymore. 500 were imported as DaSSCo records on 1st of November, the other 194 are reserved for Entomology (DiplopodaApr2023). safe_to_digitize_in_danger_zone_updated.csv
Already digitized and imported (7372 records): imported_by_dassco_since_august.xlsx
Labels NOT to use (65.789 records): labels_NOT_for_use.csv
I will take the list of 10369 barcodes used twice (from cat_numbers_used_twice_since_august.xlsx) and pull the data from the DigiApp exports so we have all specimen and storage information for these barcodes.
I will also import the 1021 that are safe to add to Specify.
The 'safe to import' list has been updated, now final, sorry. It's 1021 in total.
Mass digitization init: 500 records reserved for DaSSCo: 521 records
Here is where you can access the full list of specimens that need re-barcoding and re-imagining. All info from the original DigiApp export is included in the spreadsheet.
N:\SCI-SNM-DigitalCollections\DaSSCo\Data\specimensNeedingReBarcoding.csv
The following barcode numbers are already in use by others in Specify (10.369 records) - need to be redone: cat_numbers_used_twice_since_august.xlsx
These ones were digitized, are in the danger zone, but not used by others, so it's safe to import them (1021 records): reserved_for_dassco_already_digitized_since_august.xlsx
Not used by others in the danger zone - safe to use the labels (694 records): (don't know it worth to bother) safe_to_digitize_in_danger_zone.csv
Already digitized and imported (7372 records): imported_by_dassco_since_august.xlsx
Labels NOT to use (65.789 records): labels_NOT_for_use.csv
Maybe I just don't understand this list, but shouldn't this add up to 250.000 barcodes?
one thing I don't understand is that I have in my list 70.733 used barcodes in total, but Zusz lists in GitHub has 85.245 barcodes in total. how does it add up? Am I missing some numbers?
This is why I thought maybe the Specify peeps should have their own ticket and then let us know the final outcome. It is very confusing and all these numbers hurt my brain!
Here is what we know
Specify numbers reserved 00.924.504 through 01.174.503 Barcode labels ordered 00.924.504 through 01.424.504 therefore 250.000 more labels were ordered than numbers were reserved
with these
we have digitized 20.379 specimens with unreserved barcode numbers (danger zone) (numbers 01.177.504 through 1.185.490 and 01.299.504 through 1.308.897)
so, I think what Zsuzs is say that of these
10.369 of the numbers correspond to barcode labels connected with specimen records that we have exported that need to be redone because they duplicate already used numbers and therefore cannot be imported.
1.021 of the numbers correspond to barcode labels connected with specimen records that we have have exported that we got lucky because they do not duplicate already used numbers and therefore can be imported.
694 of the numbers correspond to barcode labels that have not been, but could still be used because they are not used by others
65.789 of the numbers correspond to barcode labels that have not been and cannot be used because they are used by others
Is this correct @Sosannah If not, would you please ELI5?
I don't think the numbers Zsuzs posted should equal the number of labels we have used in total, because they include label numbers that we have not used yet and also exclude label numbers within the reserved range.
But I do wonder why we don't find a combo with the total of 20.379 (the number of barcode labels that we have used at Herbarium C from the unreserved range) and why we don't have a combo with the total of 250.000 (the total unreserved range).
Sorry for the confusion, I was in a hurry, but wanted to provide an overview before I left.
10.369 of the numbers correspond to barcode labels connected with specimen records that we have exported that need to be redone because they duplicate already used numbers and therefore cannot be imported. - Correct.
1.021 of the numbers correspond to barcode labels connected with specimen records that we have have exported that we got lucky because they do not duplicate already used numbers and therefore can be imported. - Correct.
694 of the numbers correspond to barcode labels that have not been, but could still be used because they are not used by others - Correct, but be aware that they are not reserved to DaSSCo. (yet)
65.789 of the numbers correspond to barcode labels that have not been and cannot be used because they are used by others - Correct.
I don't think the numbers Zsuzs posted should equal the number of labels we have used in total, because they include label numbers that we have not used yet and also exclude label numbers within the reserved range. - Correct.
But I do wonder why we don't find a combo with the total of 20.379 (the number of barcode labels that we have used at Herbarium C from the unreserved range) and why we don't have a combo with the total of 250.000 (the total unreserved range).
Just need to ask: A combo with the total of 20.379:
A combo with the total of 250.000 (the total unreserved range):
It's a bit more complicated, but happy to elaborate later. This range (01.177.504 to 01.424.504) only has corresponding records of 72.957 objects in the database. That's because we have serious holes due to the auto-incrementing system of Specify. (https://discourse.specifysoftware.org/t/smarter-catalog-numbering/326) But it's a good catch, because that means that the difference (177.043 labels) might also been used by DaSSco, because they were not used by others, and the last auto-incremented catalogue number is: 1.790.685 (way above of the DaSSCo labels) However, most likely it's a combo of several fragmented ranges, so it would raise potential risk, if you physically sorted the available/already used labels from the label rolls.
The 1021 that were safe to import are now in Specify.
I went through all of the DigiApp exports for September and October and found another 1830 specimens that had already been barcoded with numbers from the 'bad range', but these numbers had not yet been given to specimens in other collections. So I went ahead and imported all of those as well. This should mean that every specimen barcoded within this range of numbers should be accounted for now: either as needing to be re-barcoded, (the location to this list is referenced in my comment above,) or the Specify records have now been created.
I created a separate ticket for the rebarcoding and reimaging process: #153
Just so it doesn't get lost, this is the quote from @chelseagraham on GitHub about which number ranges were ordered and which were reserved: "It looks like numbers were reserved for NHMD 00924504 through 01174503 in July 2022 and labels were ordered for NHMD 00924504 through 01424504 in October 2022."
A combo with the total of 250.000 (the total unreserved range):
It's a bit more complicated, but happy to elaborate later. This range (01.177.504 to 01.424.504) only has corresponding records of 72.957 objects in the database. That's because we have serious holes due to the auto-incrementing system of Specify. (https://discourse.specifysoftware.org/t/smarter-catalog-numbering/326) But it's a good catch, because that means that the difference (177.043 labels) might also been used by DaSSco, because they were not used by others, and the last auto-incremented catalogue number is: 1.790.685 (way above of the DaSSCo labels) However, most likely it's a combo of several fragmented ranges, so it would raise potential risk, if you physically sorted the available/already used labels from the label rolls.
Good news for everyone:
I played around a bit and checked the unused numbers in the 'danger zone'.
There are 174.671 numbers that can be easily used for DaSSCo after all. They are situated in 3 long runs of numbers in the range I call 'danger zone' - not used by others so can be reserved for DaSSCo. Group 1: 63411 numbers (1236141-1299551) Group 2: 42260 numbers (1308881-1351140) Group 3: 69000 numbers (1351142-1420141)
I've adjusted my stored procedure for inserting dummy records making it smarter. We can now pass it a baseline catalog number and it will then reserve a given amount of catalog numbers from that, skipping any numbers already taken. This will make it possible to snatch the skipped catalog numbers not yet taken. I will use the catalog number groups discovered by @Sosannah .
In addition to that, I could use the sproc to scan for any remaining numbers in the range of the labels already printed and reserve these too. However, this group would presumably be highly fragmented.
I will reserve these for the vascular plants collection, setting Allison @beckerah as cataloger for the time being and make sure that they're marked as projectName = "Reserved for DaSSCo" and remarks = "Specify-number mix up Group x". This is only for making these dummies easier to locate.
OK I have reserved the blocks as per the three groups, so these are the results...
Of the range of catalog numbers between 01174503 & 01424504:
I went ahead and reserved the remaining 44 anyway and you can find the numbers in this file: Specify-numbers mixup Group 4.csv
Let me know, if there's anything else I can do.
Several specimens have, by mistake, been assigned barcodes with Specify-numbers already in use. To correct this mistake, we need to:
[x] Identify which barcode numbers have been used at Herbarium C.
[x] Identify which barcode numbers are already in use by others in Specify.
[ ] Locate, re-barcode, re-image, and re-import into Specify the specimens with incorrect barcodes.