NHMDenmark / Herbarium-Sheets-workstation

Workstation and workflows for herbarium sheets for mass digitisation (DaSSCo)
0 stars 0 forks source link

Specify-number mix up: Re-barcode, image, and import to Specify of speciemens at Herbarium C #151

Open Gomismis opened 4 weeks ago

Gomismis commented 4 weeks ago

Several specimens have, by mistake, been assigned barcodes with Specify-numbers already in use. To correct this mistake, we need to:

Gomismis commented 4 weeks ago

In this document you will find list of all our numbers and it I shows:

  1. How many rolls of 25000 barcodes we have
  2. Which of them contains our reserved/unreserved numbers.
  3. Which numbers we have used already used.

    dont_panic.xlsx

Gomismis commented 4 weeks ago

I have marked the rolls that cotains reserved numbers at the Hebarium, so we won't make the same mistake. Image

Sosannah commented 4 weeks ago

The following barcode numbers are already in use by others in Specify (10.369 records) - need to be redone: cat_numbers_used_twice_since_august.xlsx

These ones were digitized, are in the danger zone, but not used by others, so it's safe to import them (1021 records): reserved_for_dassco_already_digitized_since_august.xlsx

_Not used by others in the danger zone - safe to use the labels (694 records): (don't know it worth to bother) safe_to_digitize_in_danger_zone.csv_

The previous ones (694 records) are not available anymore. 500 were imported as DaSSCo records on 1st of November, the other 194 are reserved for Entomology (DiplopodaApr2023). safe_to_digitize_in_danger_zone_updated.csv

Already digitized and imported (7372 records): imported_by_dassco_since_august.xlsx

Labels NOT to use (65.789 records): labels_NOT_for_use.csv

beckerah commented 4 weeks ago

I will take the list of 10369 barcodes used twice (from cat_numbers_used_twice_since_august.xlsx) and pull the data from the DigiApp exports so we have all specimen and storage information for these barcodes.

I will also import the 1021 that are safe to add to Specify.

Sosannah commented 4 weeks ago

The 'safe to import' list has been updated, now final, sorry. It's 1021 in total.

Mass digitization init: 500 records reserved for DaSSCo: 521 records

beckerah commented 4 weeks ago

Here is where you can access the full list of specimens that need re-barcoding and re-imagining. All info from the original DigiApp export is included in the spreadsheet.

N:\SCI-SNM-DigitalCollections\DaSSCo\Data\specimensNeedingReBarcoding.csv

RebekkaML commented 4 weeks ago

The following barcode numbers are already in use by others in Specify (10.369 records) - need to be redone: cat_numbers_used_twice_since_august.xlsx

These ones were digitized, are in the danger zone, but not used by others, so it's safe to import them (1021 records): reserved_for_dassco_already_digitized_since_august.xlsx

Not used by others in the danger zone - safe to use the labels (694 records): (don't know it worth to bother) safe_to_digitize_in_danger_zone.csv

Already digitized and imported (7372 records): imported_by_dassco_since_august.xlsx

Labels NOT to use (65.789 records): labels_NOT_for_use.csv

Maybe I just don't understand this list, but shouldn't this add up to 250.000 barcodes?

Gomismis commented 4 weeks ago

one thing I don't understand is that I have in my list 70.733 used barcodes in total, but Zusz lists in GitHub has 85.245 barcodes in total. how does it add up? Am I missing some numbers?

chelseagraham commented 4 weeks ago

This is why I thought maybe the Specify peeps should have their own ticket and then let us know the final outcome. It is very confusing and all these numbers hurt my brain!

Here is what we know

Specify numbers reserved 00.924.504 through 01.174.503 Barcode labels ordered 00.924.504 through 01.424.504 therefore 250.000 more labels were ordered than numbers were reserved

with these

we have digitized 20.379 specimens with unreserved barcode numbers (danger zone) (numbers 01.177.504 through 1.185.490 and 01.299.504 through 1.308.897)

so, I think what Zsuzs is say that of these

10.369 of the numbers correspond to barcode labels connected with specimen records that we have exported that need to be redone because they duplicate already used numbers and therefore cannot be imported.

1.021 of the numbers correspond to barcode labels connected with specimen records that we have have exported that we got lucky because they do not duplicate already used numbers and therefore can be imported.

694 of the numbers correspond to barcode labels that have not been, but could still be used because they are not used by others

65.789 of the numbers correspond to barcode labels that have not been and cannot be used because they are used by others

Is this correct @Sosannah If not, would you please ELI5?

I don't think the numbers Zsuzs posted should equal the number of labels we have used in total, because they include label numbers that we have not used yet and also exclude label numbers within the reserved range.

But I do wonder why we don't find a combo with the total of 20.379 (the number of barcode labels that we have used at Herbarium C from the unreserved range) and why we don't have a combo with the total of 250.000 (the total unreserved range).

Sosannah commented 4 weeks ago

Sorry for the confusion, I was in a hurry, but wanted to provide an overview before I left.

10.369 of the numbers correspond to barcode labels connected with specimen records that we have exported that need to be redone because they duplicate already used numbers and therefore cannot be imported. - Correct.

1.021 of the numbers correspond to barcode labels connected with specimen records that we have have exported that we got lucky because they do not duplicate already used numbers and therefore can be imported. - Correct.

694 of the numbers correspond to barcode labels that have not been, but could still be used because they are not used by others - Correct, but be aware that they are not reserved to DaSSCo. (yet)

65.789 of the numbers correspond to barcode labels that have not been and cannot be used because they are used by others - Correct.

I don't think the numbers Zsuzs posted should equal the number of labels we have used in total, because they include label numbers that we have not used yet and also exclude label numbers within the reserved range. - Correct.

But I do wonder why we don't find a combo with the total of 20.379 (the number of barcode labels that we have used at Herbarium C from the unreserved range) and why we don't have a combo with the total of 250.000 (the total unreserved range).

Just need to ask: A combo with the total of 20.379:

A combo with the total of 250.000 (the total unreserved range):

It's a bit more complicated, but happy to elaborate later. This range (01.177.504 to 01.424.504) only has corresponding records of 72.957 objects in the database. That's because we have serious holes due to the auto-incrementing system of Specify. (https://discourse.specifysoftware.org/t/smarter-catalog-numbering/326) But it's a good catch, because that means that the difference (177.043 labels) might also been used by DaSSco, because they were not used by others, and the last auto-incremented catalogue number is: 1.790.685 (way above of the DaSSCo labels) However, most likely it's a combo of several fragmented ranges, so it would raise potential risk, if you physically sorted the available/already used labels from the label rolls.

beckerah commented 4 weeks ago

The 1021 that were safe to import are now in Specify.

beckerah commented 3 weeks ago

I went through all of the DigiApp exports for September and October and found another 1830 specimens that had already been barcoded with numbers from the 'bad range', but these numbers had not yet been given to specimens in other collections. So I went ahead and imported all of those as well. This should mean that every specimen barcoded within this range of numbers should be accounted for now: either as needing to be re-barcoded, (the location to this list is referenced in my comment above,) or the Specify records have now been created.

RebekkaML commented 3 weeks ago

I created a separate ticket for the rebarcoding and reimaging process: #153

RebekkaML commented 3 weeks ago

Just so it doesn't get lost, this is the quote from @chelseagraham on GitHub about which number ranges were ordered and which were reserved: "It looks like numbers were reserved for NHMD 00924504 through 01174503 in July 2022 and labels were ordered for NHMD 00924504 through 01424504 in October 2022."

Sosannah commented 2 weeks ago

A combo with the total of 250.000 (the total unreserved range):

It's a bit more complicated, but happy to elaborate later. This range (01.177.504 to 01.424.504) only has corresponding records of 72.957 objects in the database. That's because we have serious holes due to the auto-incrementing system of Specify. (https://discourse.specifysoftware.org/t/smarter-catalog-numbering/326) But it's a good catch, because that means that the difference (177.043 labels) might also been used by DaSSco, because they were not used by others, and the last auto-incremented catalogue number is: 1.790.685 (way above of the DaSSCo labels) However, most likely it's a combo of several fragmented ranges, so it would raise potential risk, if you physically sorted the available/already used labels from the label rolls.

Good news for everyone:

I played around a bit and checked the unused numbers in the 'danger zone'.

There are 174.671 numbers that can be easily used for DaSSCo after all. They are situated in 3 long runs of numbers in the range I call 'danger zone' - not used by others so can be reserved for DaSSCo. Group 1: 63411 numbers (1236141-1299551) Group 2: 42260 numbers (1308881-1351140) Group 3: 69000 numbers (1351142-1420141)

FedorSteeman commented 2 weeks ago

I've adjusted my stored procedure for inserting dummy records making it smarter. We can now pass it a baseline catalog number and it will then reserve a given amount of catalog numbers from that, skipping any numbers already taken. This will make it possible to snatch the skipped catalog numbers not yet taken. I will use the catalog number groups discovered by @Sosannah .

In addition to that, I could use the sproc to scan for any remaining numbers in the range of the labels already printed and reserve these too. However, this group would presumably be highly fragmented.

I will reserve these for the vascular plants collection, setting Allison @beckerah as cataloger for the time being and make sure that they're marked as projectName = "Reserved for DaSSCo" and remarks = "Specify-number mix up Group x". This is only for making these dummies easier to locate.

FedorSteeman commented 2 weeks ago

OK I have reserved the blocks as per the three groups, so these are the results...

Of the range of catalog numbers between 01174503 & 01424504:

I went ahead and reserved the remaining 44 anyway and you can find the numbers in this file: Specify-numbers mixup Group 4.csv

Let me know, if there's anything else I can do.