Open gbif-portal opened 5 years ago
From my Skype discussion with Thomas: the proposal is to create country type annual reports (https://www.gbif.org/sites/default/files/gbif_analytics/country/US/GBIF_CountryReport_US.pdf) for individual data providers. Thomas has a use case for this and generating is a quick way to value proposition (AKA why is GBIF important to my institution?). @dschigel replied that Citations are availavle for the datasets and publishers, and metrics + activity for the datasets. Thomas specifies that he is looking more for data on publications and downloads per month, per year in a sort of a dashboard that has stats for individual data publishers that shows how the data is being used - for example, number of publications and downloads, faceted by day, week, month, year. Much like we have for individual publisher's occurrence data (https://www.gbif.org/dataset/821cc27a-e3bb-4bc5-ac34-89ada245069d/metrics. The reason and need are to show and track at the institutional level how and why contributions to GBIF are relevant.
@orrellt is Thomas Orrell
Rolling up for institution is important and the new database hardware at GBIF should now allow those queries to be added.
At a minimum I expect we should do charts with number downloads, records downloaded per month and citations per month for datasets, and then rolled up as counts shown on the owning institution as well.
In a subsequent step any of the ideas for improved metrics for datasets (https://github.com/gbif/portal16/issues/761) that are adopted should also be rolled up for institutions
Until March, on the old database servers, we had very slow queries to this data.
Therefore, there's a not-documented API for the country reports to use, for example http://api.gbif.org/v1/occurrence/download/statistics/downloadedRecordsByDataset?publishingCountry=AT&datasetKey=06d3504c-a3bb-11e2-95b8-00145eb45e9a and http://api.gbif.org/v1/occurrence/download/statistics/downloadsByUserCountry?userCountry=DK&fromDate=2017
It might be useful, but it will be worth investigating if the new database servers can support these queries without pre-generating statistics tables.
Also, I've been asked if these data (rolled up) can be downloaded and added to monthly and other types of NMNH metrics. So there is interest by institutions to track how data are being used in initiatives like GBIF.
I think institutional level metrics matter a lot in Canada, perhaps it was David Shorthouse or Mark Graham, or both of them.
Fwiw, a recent Twitter rant has had me thinking the past few days about a graph that could build on one information element that appears to be part of GRSciColl's collections and institutions: a simple horizontal thermometer bar that shows a) the total number of specimens in a collection/institution b) total number of digitized records.
This could serve as a quick visual guide to progress on digitization. If possible, we could add a temporal element to b) that indicates any significant pulses or milestones as progress over time.
I think a level of digitization stat would be a useful metric to come from GRSciColl in conjunction with collection-level metadata derived from both GRSciColl and the TDWG effort to reinvent the collection descriptions standard (formerly known as NCD). However, this stat is really different than the above thread regarding publications and downloads of institutional content in GBIF. Maybe what Kyle is proposing is a fork to a new thread?
I think you are right @orrellt that this is a seprate, GRSciColl related opportunty, but it is somewhat dependent on the dataset metrics that you requsted, Thomas. For one of the activties on mobilisation prioritization I work on, the "thermormeter" would be most welcome as a numeric measure of mobilistion potenial of a collection, but it won't be applicable to the institutional metrics of the online data, whcih would be a summary of the dataset metrics as published through GBIF. But, having the information on potential and metrics of access and use, one can project the impact of complete or fuller digitization - which is how collection can fight for better resources and funding, demonstrating projecting impact of digitization effrots based on the actual metrics of what is online already. Good thread here! Someone skilled in GitHub might know how to fork it into a new issue with the Idea tag.
Occurrence mtrics are now available directly on publisher pages (before you had to go to the occurrence metrics page) https://www.gbif.org/publisher/bc092ff0-02e4-11dc-991f-b8a03c50a862
Download breakdowns (for datasets and publishers) require API changes and have been captured in https://github.com/gbif/registry/issues/117
Hi Morten, I like the easy access to the occurrence data. I hope to see similar graphical stats on the number of publications and downloads that include this data source (publisher) soon.
Citations is already mentioned on publisher pages in the header with a count. It is now also displayed along side the metrics along with a yearly breakdown.
Notice that it is also possible to drive more detailed custom charts/stats of the public APIs as well as seeing more details and breakdowns on the literature search page.
Without the API changes in gbif/registry#117 this is about as far as I can take it for now.
FWIW, I've been toying with how to reveal institutional reach beyond downloads and pubs - perhaps more closely aligned with internal funding and fiscal calendars - by using the collecting and determination activities of past/present staff (via ORCID) as revealed from digitized specimens curated at other institutions. It's not (yet) a very pretty view, but here's the idea: https://bloodhound-tracker.net/organization/Q131626/metrics. And another one where staff have been more engaged/informed in the process where numbers are a tad more convincing: https://bloodhound-tracker.net/organization/Q1032232/metrics (disclosure, was my past employer). Am presently toiling with how to use wikidata to churn institutionCode/collectionCode back into other institutional identifiers (either Ringgold, GRID, or wikidata's own because that's what I get from ORCID) such that the presentation of these numbers remain at an institutional level and not somewhat buried as incomprehensible lists of collection codes. That's where GRSciColl may be important.
Metrics for publishers
Opening an issue on request from Thomas Orrell to consider adding summary and time sliced metrics for the publisher pages. Will add more content to this issue
User provided contact info: @dschigel System: Firefox 60.0.0 / Windows 10.0.0 User: See in registry Referer: https://www.gbif.org/publisher/bc092ff0-02e4-11dc-991f-b8a03c50a862 Window size: width 1922 - height 1073 API log&_a=(columns:!(_source),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) Site log&_a=(columns:!(_source),index:'prod-portal-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) System health at time of feedback: OPERATIONAL