gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

GRSciColl institutions’ # specimen( record)s in GBIF #5291

Closed scooleman closed 3 months ago

scooleman commented 6 months ago

On the institutions’ overview page, the displayed number for specimens in GBIF currently appears to be an underestimate.

The RBINS real number of specimens shared with GBIF is at least much higher than the currently displayed number:

https://scientific-collections.gbif.org/institution/c2bfdeef-9c03-435e-8465-c483dadd6995

655,110 is the number of RBINS specimen records (instead of the number of RBINS specimens) in GBIF (that are aggregated in GRSciColl); that numeric GRSciColl label is misleading.

Here follows the inductive derivation of this finding:

Hence, our technical request to improve the name of the two data fields representing that number of ‘specimen records’ (instead of ‘specimens’):

GRSciColl-RBINS#specimen-records

GRSciColl-RBINS#specimen-records+%+

Besides, the displayed percentage ‘Digitized / total’ is a rounded underestimation for the proportion of digitized specimens of the 38M specimens in total at RBINS.

Taking the values of the data field ‘organism quantity’ into account for the institutions' total number or percentage of specimens in GBIF, could technically be another solution, if those calculations can be run smoothly.

MortenHofft commented 6 months ago

It makes sense. And thank you for being so specific and suggesting solutions.

renaming the label is easy. Counting organism quantity is also an option, but the organism quantity type could be anything, so it wouldn't be useful results (it might in your dataset, but not generally). There is also the individualCount field. Perhaps that could be used as that is guaranteed to reflect individuals and not a free text quantity type.

Could also be that the digitized number simply do not make sense as the 2 numbers cannot be compared. @ManonGros I would like your guidance here

ManonGros commented 6 months ago

Thanks @scooleman for the feedback and @MortenHofft for the follow up. I think using the wording specimen records instead of specimen makes a lot of sense.

Maybe the Digitized / total chart should be removed. I cannot think of an accurate way of calculating such metrics based on the GBIF specimen records. I think the "digitisation" status also becomes a bit complicated when we think of various indicators (MIDs and such).

scooleman commented 6 months ago

Thank you @ManonGros and @MortenHofft to confirm it makes sense to change the wording 'specimens' into 'specimen records' for those two marked labels about automatically generated numbers based on GBIF data records.

Skipping the currently displayed 'Digitized / total' chart in the GRSciColl could thereby indeed solve that additional issue of comparing apples with pears or oranges.

scooleman commented 3 months ago

renaming the label is easy. Counting organism quantity is also an option, but the organism quantity type could be anything, so it wouldn't be useful results (it might in your dataset, but not generally). There is also the individualCount field. Perhaps that could be used as that is guaranteed to reflect individuals and not a free text quantity type

Regarding the label suggested to rename: image I agree it seems to be an easy option, but why not, if it can be smooth.

About the more difficult option related to the quantity data field: are you suggesting to change the DwC mapping from organismQuantity to individualCount for indicating the number of specimens per record, if the quantityType is 'SpecimensInContainer' (i.e. the situation for (currently nearly) all of the published RBINS specimen records)?

MortenHofft commented 3 months ago

I've changed the translation in Crowdin and removed the digitized/total count

MortenHofft commented 3 months ago

About the more difficult option related to the quantity data field: are you suggesting to change the DwC mapping from organismQuantity to individualCount for indicating the number of specimens per record, if the quantityType is 'SpecimensInContainer' (i.e. the situation for (currently nearly) all of the published RBINS specimen records)?

Yes I would think individual counts we could take into account for that number. For now I have removed it as per https://github.com/gbif/portal-feedback/issues/5291#issuecomment-2045214664