Closed scooleman closed 3 months ago
It makes sense. And thank you for being so specific and suggesting solutions.
renaming the label is easy. Counting organism quantity
is also an option, but the organism quantity type could be anything, so it wouldn't be useful results (it might in your dataset, but not generally). There is also the individualCount field. Perhaps that could be used as that is guaranteed to reflect individuals and not a free text quantity type
.
Could also be that the digitized number simply do not make sense as the 2 numbers cannot be compared. @ManonGros I would like your guidance here
Thanks @scooleman for the feedback and @MortenHofft for the follow up.
I think using the wording specimen records
instead of specimen makes a lot of sense.
Maybe the Digitized / total
chart should be removed. I cannot think of an accurate way of calculating such metrics based on the GBIF specimen records. I think the "digitisation" status also becomes a bit complicated when we think of various indicators (MIDs and such).
Thank you @ManonGros and @MortenHofft to confirm it makes sense to change the wording 'specimens' into 'specimen records' for those two marked labels about automatically generated numbers based on GBIF data records.
Skipping the currently displayed 'Digitized / total' chart in the GRSciColl could thereby indeed solve that additional issue of comparing apples with pears or oranges.
renaming the label is easy. Counting
organism quantity
is also an option, but the organism quantity type could be anything, so it wouldn't be useful results (it might in your dataset, but not generally). There is also the individualCount field. Perhaps that could be used as that is guaranteed to reflect individuals and not a free textquantity type
Regarding the label suggested to rename: I agree it seems to be an easy option, but why not, if it can be smooth.
About the more difficult option related to the quantity data field: are you suggesting to change the DwC mapping from organismQuantity to individualCount for indicating the number of specimens per record, if the quantityType is 'SpecimensInContainer' (i.e. the situation for (currently nearly) all of the published RBINS specimen records)?
I've changed the translation in Crowdin and removed the digitized/total count
About the more difficult option related to the quantity data field: are you suggesting to change the DwC mapping from organismQuantity to individualCount for indicating the number of specimens per record, if the quantityType is 'SpecimensInContainer' (i.e. the situation for (currently nearly) all of the published RBINS specimen records)?
Yes I would think individual counts we could take into account for that number. For now I have removed it as per https://github.com/gbif/portal-feedback/issues/5291#issuecomment-2045214664
On the institutions’ overview page, the displayed number for specimens in GBIF currently appears to be an underestimate.
The RBINS real number of specimens shared with GBIF is at least much higher than the currently displayed number:
https://scientific-collections.gbif.org/institution/c2bfdeef-9c03-435e-8465-c483dadd6995
655,110 is the number of RBINS specimen records (instead of the number of RBINS specimens) in GBIF (that are aggregated in GRSciColl); that numeric GRSciColl label is misleading.
Here follows the inductive derivation of this finding:
If filtering within all the RBINS Specimens in GRSciColl based on organism quantity, for instance 2500, 2 results appear: https://scientific-collections.gbif.org/institution/c2bfdeef-9c03-435e-8465-c483dadd6995/specimens?organismQuantity=2500
Those 2 data records represent 5000 specimens in total, since each of those records refer to a container including 2500 specimens of small Invertebrates.
Thus, the number of results on the GRSciColl Specimens page is equal to the number of data records of specimens, briefly the number of specimen records.
Subsequently, the total number of results (i.e. 655,110) displayed (without filter) on the RBINS Specimens page in GRSciColl: https://scientific-collections.gbif.org/institution/c2bfdeef-9c03-435e-8465-c483dadd6995/specimens is equal to the number of the RBINS specimen records being aggregated in GRSciColl so far.
That same number (i.e. 655,110) is also displayed on the RBINS overview page in GRSciColl, but those 655,110 data records represent much more than 655,110 specimens shared with GBIF or in GBIF.
Hence, our technical request to improve the name of the two data fields representing that number of ‘specimen records’ (instead of ‘specimens’):
Besides, the displayed percentage ‘Digitized / total’ is a rounded underestimation for the proportion of digitized specimens of the 38M specimens in total at RBINS.
Taking the values of the data field ‘organism quantity’ into account for the institutions' total number or percentage of specimens in GBIF, could technically be another solution, if those calculations can be run smoothly.