gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Metrics for publishers #1912

Open gbif-portal opened 5 years ago

gbif-portal commented 5 years ago

Metrics for publishers

Opening an issue on request from Thomas Orrell to consider adding summary and time sliced metrics for the publisher pages. Will add more content to this issue


User provided contact info: @dschigel System: Firefox 60.0.0 / Windows 10.0.0 User: See in registry Referer: https://www.gbif.org/publisher/bc092ff0-02e4-11dc-991f-b8a03c50a862 Window size: width 1922 - height 1073 API log&_a=(columns:!(_source),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) Site log&_a=(columns:!(_source),index:'prod-portal-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) System health at time of feedback: OPERATIONAL

dschigel commented 5 years ago

From my Skype discussion with Thomas: the proposal is to create country type annual reports (https://www.gbif.org/sites/default/files/gbif_analytics/country/US/GBIF_CountryReport_US.pdf) for individual data providers. Thomas has a use case for this and generating is a quick way to value proposition (AKA why is GBIF important to my institution?). @dschigel replied that Citations are availavle for the datasets and publishers, and metrics + activity for the datasets. Thomas specifies that he is looking more for data on publications and downloads per month, per year in a sort of a dashboard that has stats for individual data publishers that shows how the data is being used - for example, number of publications and downloads, faceted by day, week, month, year. Much like we have for individual publisher's occurrence data (https://www.gbif.org/dataset/821cc27a-e3bb-4bc5-ac34-89ada245069d/metrics. The reason and need are to show and track at the institutional level how and why contributions to GBIF are relevant.

dschigel commented 5 years ago

@orrellt is Thomas Orrell

timrobertson100 commented 5 years ago

Rolling up for institution is important and the new database hardware at GBIF should now allow those queries to be added.

At a minimum I expect we should do charts with number downloads, records downloaded per month and citations per month for datasets, and then rolled up as counts shown on the owning institution as well.

In a subsequent step any of the ideas for improved metrics for datasets (https://github.com/gbif/portal16/issues/761) that are adopted should also be rolled up for institutions

MattBlissett commented 5 years ago

Until March, on the old database servers, we had very slow queries to this data.

Therefore, there's a not-documented API for the country reports to use, for example http://api.gbif.org/v1/occurrence/download/statistics/downloadedRecordsByDataset?publishingCountry=AT&datasetKey=06d3504c-a3bb-11e2-95b8-00145eb45e9a and http://api.gbif.org/v1/occurrence/download/statistics/downloadsByUserCountry?userCountry=DK&fromDate=2017

It might be useful, but it will be worth investigating if the new database servers can support these queries without pre-generating statistics tables.

orrellt commented 5 years ago

Also, I've been asked if these data (rolled up) can be downloaded and added to monthly and other types of NMNH metrics. So there is interest by institutions to track how data are being used in initiatives like GBIF.

dschigel commented 5 years ago

I think institutional level metrics matter a lot in Canada, perhaps it was David Shorthouse or Mark Graham, or both of them.

kcopas commented 5 years ago

Fwiw, a recent Twitter rant has had me thinking the past few days about a graph that could build on one information element that appears to be part of GRSciColl's collections and institutions: a simple horizontal thermometer bar that shows a) the total number of specimens in a collection/institution b) total number of digitized records.

This could serve as a quick visual guide to progress on digitization. If possible, we could add a temporal element to b) that indicates any significant pulses or milestones as progress over time.

orrellt commented 5 years ago

I think a level of digitization stat would be a useful metric to come from GRSciColl in conjunction with collection-level metadata derived from both GRSciColl and the TDWG effort to reinvent the collection descriptions standard (formerly known as NCD). However, this stat is really different than the above thread regarding publications and downloads of institutional content in GBIF. Maybe what Kyle is proposing is a fork to a new thread?

dschigel commented 5 years ago

I think you are right @orrellt that this is a seprate, GRSciColl related opportunty, but it is somewhat dependent on the dataset metrics that you requsted, Thomas. For one of the activties on mobilisation prioritization I work on, the "thermormeter" would be most welcome as a numeric measure of mobilistion potenial of a collection, but it won't be applicable to the institutional metrics of the online data, whcih would be a summary of the dataset metrics as published through GBIF. But, having the information on potential and metrics of access and use, one can project the impact of complete or fuller digitization - which is how collection can fight for better resources and funding, demonstrating projecting impact of digitization effrots based on the actual metrics of what is online already. Good thread here! Someone skilled in GitHub might know how to fork it into a new issue with the Idea tag.

MortenHofft commented 5 years ago

Occurrence mtrics are now available directly on publisher pages (before you had to go to the occurrence metrics page) https://www.gbif.org/publisher/bc092ff0-02e4-11dc-991f-b8a03c50a862

Download breakdowns (for datasets and publishers) require API changes and have been captured in https://github.com/gbif/registry/issues/117

orrellt commented 5 years ago

Hi Morten, I like the easy access to the occurrence data. I hope to see similar graphical stats on the number of publications and downloads that include this data source (publisher) soon.

MortenHofft commented 5 years ago

Citations is already mentioned on publisher pages in the header with a count. It is now also displayed along side the metrics along with a yearly breakdown.

Screenshot 2019-04-30 at 15 15 06

Notice that it is also possible to drive more detailed custom charts/stats of the public APIs as well as seeing more details and breakdowns on the literature search page.

Without the API changes in gbif/registry#117 this is about as far as I can take it for now.

dshorthouse commented 5 years ago

FWIW, I've been toying with how to reveal institutional reach beyond downloads and pubs - perhaps more closely aligned with internal funding and fiscal calendars - by using the collecting and determination activities of past/present staff (via ORCID) as revealed from digitized specimens curated at other institutions. It's not (yet) a very pretty view, but here's the idea: https://bloodhound-tracker.net/organization/Q131626/metrics. And another one where staff have been more engaged/informed in the process where numbers are a tad more convincing: https://bloodhound-tracker.net/organization/Q1032232/metrics (disclosure, was my past employer). Am presently toiling with how to use wikidata to churn institutionCode/collectionCode back into other institutional identifiers (either Ringgold, GRID, or wikidata's own because that's what I get from ORCID) such that the presentation of these numbers remain at an institutional level and not somewhat buried as incomprehensible lists of collection codes. That's where GRSciColl may be important.