NHMDenmark / DanSpecify

Important files regarding the Danish instance of the Specify database system for collections digitisation and management, plus placeholder for issue tracking. Guidelines, manuals and other kinds of documentations will be gathered on the wiki.
3 stars 3 forks source link

Statistics from Specify for DaSSCo Annual Report #264

Closed PipBrewer closed 8 months ago

PipBrewer commented 9 months ago

I need to write the annual report for DaSSCo. Hence, would it be possible to have some statistics from Specify?

I would like to know for NHMD, AU and NHMA (separately):

Ideally, I would like these by the end of February so that I have time to generate a narrative around them (comparing them to previous years and projections) and to have others (such as the DaSSCo Steering Group) check the report before I submit it at the end of March 2024.

FedorSteeman commented 9 months ago

I'm looking forward to upgrading to the latest version of Specify7 so we can have a lot of these stats handy:

image

But it's on our radar now so we'll get on to it.

FedorSteeman commented 8 months ago

@PipBrewer Hereby the statistics you asked for:

NHMD
Total as of all 2023: 1.091.257
Published as of 2023: 793.267
Total of all time: 1.116.723
Published all Time: 802.492
Added/Updated 2023: 379.386
NHMA
Total as of all 2023: 428.871
Published as of 2023: 226.028
Total of all time: 430.896
Published all Time: 228.037
Added/Updated 2023: 7.509
AUH
Total as of all 2023: 0
Published as of 2023: 0
Total of all time: 0
Published all Time: 0
Added/Updated 2023: 0

Also in the attached spreadsheet: DaSSCo Statistics 2023.xlsx

Please review and ask any followup questions you may have.

FedorSteeman commented 8 months ago

BTW I got a bit curious about the high number of unpublished records for NHMD and it turns out that not only are these actively set to be "false", most of these come from vascular plants.

Count Collection name
Published False: 3 NHMD Vertebrate Paleontology
Published False: 49.563 NHMD Entomology
Published False: 7.503 NHMD Invertebrate Zoology
Published False: 88 NHMD Mammalogy
Published False: 253.542 NHMD Vascular Plants
Published False: 1 NHMD Danekrae
Published False: 5 NHMD Amber

Is this deliberate and part of a DaSSCo strategy?

Sosannah commented 8 months ago

Regarding to the high number of unpublished records for NHMD: 24560 of them were the type database of Vascular plants - they've been published today.

The remaining ones are the "dummy records" reserved for DaSSCo, I guess. (same case for Entomology)

FedorSteeman commented 8 months ago

Ooooh Good catch! I will need to redo the statistics then. I forgot all about the dummy records!

FedorSteeman commented 8 months ago

Hereby the adjusted numbers:

NHMD
Period Record_Count
Total prior to 2024: 857.005
Published prior to 2024: 817.153
Total of all time: 874.603
Published all Time: 827.142
Added/Updated 2023: 373.590

Only NHMD needed to have dummy records subtracted.

Redone spreadsheet: DaSSCo Statistics 2023.xlsx

FYI I used this SQL: DaSSCo Statistics.sql.txt

PipBrewer commented 8 months ago

@FedorSteeman Thank you for this. I'm sorry that I didn't ask this originally. Do you have the total number of published specimens for all Danish institutions prior to 2024?

Sosannah commented 8 months ago

As far as I know, we only publish the occurrences of NHMA and NHMD, not the others. So the total number of published specimens for the rest of the Danish institutions prior to 2024 is 0.

The other numbers for the smaller institutes: (also added to the spreadsheet: DaSSCo.Statistics.2023_all.xlsx)

MSJN
Total as of all 2023: 5.339
Published as of 2023: 0
Total of all time: 5.367
Published all Time: 0
Added/Updated 2023: 312
MUSERUM
Total as of all 2023: 23.091
Published as of 2023: 0
Total of all time: 23.116
Published all Time: 0
Added/Updated 2023: 5.847
Naturama
Total as of all 2023: 10.543
Published as of 2023: 0
Total of all time: 10.543
Published all Time: 0
Added/Updated 2023: 0
OESM
Total as of all 2023: 12.787
Published as of 2023: 0
Total of all time: 12.851
Published all Time: 0
Added/Updated 2023: 1.826
FIMUS
Total as of all 2023: 2.137
Published as of 2023: 0
Total of all time: 2.139
Published all Time: 0
Added/Updated 2023: 190
FedorSteeman commented 8 months ago

Hi @PipBrewer ! The only institutions publishing to GBIF currently are NHMD and NHMA so you can just add those numbers up.

FedorSteeman commented 8 months ago

Oh the response by @Sosannah wasn't visible when I finally started replying, but thank you, Zsuzsanna!

FedorSteeman commented 8 months ago

Presumed done. Will reopen if needed.

jlegind commented 8 months ago

Quick question - can you separate out number of records where NHMD is the publisher as opposed to NHMD is 'hosting' the records on behalf of other institutions?

FedorSteeman commented 8 months ago

@jlegind The records published by Specify are the ones hosted by NHMD. NHMA is the only other institution which has its records published.

PipBrewer commented 8 months ago

@FedorSteeman @Sosannah Thank you for these numbers. After adjusting for dummy records Fedor quotes NHMD figures as 857,005 in Specify of which 817,153 are published to GBIF. Zsuzs quotes (in her updated spreadsheet) 1,091,257 in Specify of which 793,267 are pushed to GBIF. I'm guessing that Zsuzs you didn't update the NHMD figures in your spreadsheet, only the numbers for the other institutions and so I should use Fedor's figures for NHMD and yours for the rest?

Do we have reasons why there are unpublished records for NHMD? Is there a reason why we don't publish records for other institutions?

FedorSteeman commented 8 months ago

@PipBrewer We may need to run these numbers through another iteration to make them as accurate as possible and perhaps compare to GBIF.

The reason for unpublished records can vary on a case-per-case basis. It is best to ask to responsible curators. For many, there may no longer be any reason for keeping them from the public. Typical reasons are waiting for a paper to be published, or the material or its locality being of a sensitive nature.

The reason that other institutions are not publishing is simply that we have not initiated them for them and the ones that were asked have not been interested in it yet.

Sosannah commented 8 months ago

Oops, you're right, I added my numbers with the small institutes to the wrong sheet. And also right, I didn't update the NHMD/NHMA figures - you can use Fedor's figures for NHMD/NHMA and mine for the rest. Sorry about that!

But these numbers are growing constantly - since then the 24.560 type specimen records were also published, and Jen is importing new sheets quite regularly, so if the cut-point is not strictly defined, then a new calculation could make sense, as Fedor suggested.

Other reasons of unpublished records for NHMD:

PipBrewer commented 8 months ago

Thanks. I'm mostly interested in figures as of 01/01/2024 for now, so not worried about growing numbers. Cheers!