gigascience / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
9 stars 14 forks source link

Data citation metrics #137

Open only1chunts opened 6 years ago

only1chunts commented 6 years ago

User Story

Multiple user stories for this depending on your point of view:

As an author /submitter I want to be able to see the usage stats of my datasets (how many views, downloads, tweets, citations etc..) So that I can show my employer/funder that the work was worthwhile See #161

As a website user I want to be able to see the usage stats of a dataset (how many views, downloads, tweets, citations etc..) So that I can see how impactful it has been See #832

As a admin I want to be able to see the usage stats of a set of datasets so that I can see average trends over time and between various groups of datasets e.g. All datasets, a dataset range or a (CSV) list of individual datasets related in some way

Additional information

We need to think about improving the metrics for usage of our datasets, the COUNTER code of preactice for Research Data is addressing how this should be done: https://docs.google.com/document/d/1n1LsS3suFNnnYfqltf3Qjaup0taKu-q54Kico_IHXdY/edit# we should be a part of that project as early adopters.

This is also linked work on #161 and #14

only1chunts commented 6 years ago

see also comments on issue #161
the number of clicks (google analytics) on a dataset page number of direct downloads from GigaDB, citations, any social media mentions. We should also look into ways of tracking downloads from the FTP server(s) as well as number of times the dataset is discovered by searches.

only1chunts commented 6 years ago

perhaps look at datacite for their analytics to see how it compares to google etc.

ScottBGI commented 6 years ago

See the DataCite stats page for their analytics: https://stats.datacite.org/

only1chunts commented 6 years ago

make data count have released version 1:

https://github.com/CDLUC3/Make-Data-Count/blob/master/getting-started.md

there is also a blog about it: https://makedatacount.org/2018/06/05/its-time-to-make-your-data-count/

ScottBGI commented 5 years ago

Make data count is now utilising with Event Data, so we can pull citation metrics using the API. E.g. see https://api.datacite.org/events?doi=10.5524/100008

As you can see very few datasets are getting correctly cited but you might want to display true citations and the googlescholar/europepmc full text workarounds. Especially for the older citations that were before the Data Citation Principles even existed. Event Data and the API lets you retrospectively add the missing data citations in the referees, so I don't know if updating retrospective metadata is a useful thing to try to automate or curate (student project?).

ScottBGI commented 5 years ago

A new open source approach to tackle this is also CiteAs: http://citeas.org/

only1chunts commented 5 years ago

just had a look at Event Data and it looks like we could make use of that, its a simple API that we can call for "events" involving GigaDB DOI's. It can be called for ALL GigaDB by prefix 10.5524: curl "https://api.eventdata.crossref.org/v1/events?mailto=YOUR_EMAIL_HERE&rows=10000&obj-id.prefix=10.5524" > gigadb.json The issue with that might be that it will include reviews which we current mint DOI's for with the format 10.5524/review.nnnnnn Alternative could be to call each dataset individually: curl "https://api.eventdata.crossref.org/v1/events?mailto=YOUR_EMAIL_HERE&rows=10000&obj-id=10.5524/100001" > gigadb-100001.json this might get to heavy unless its done on-demand when someone tries to look at the event data for a particular DOI from the website. To do that we would need a html page that can call and parse the json from event data, and display it in a useful way.

ScottBGI commented 5 years ago

You should check out what the Impact Story people are building on this data. On top of CiteAs that I mentioned (https://github.com/Impactstory/citeas-api) see also PaperBuzz (https://github.com/Impactstory/paperbuzz-api).

only1chunts commented 5 years ago

Thanks Scott, it might be worth keeping an eye on paperbuzz, but at present they dont seem to do anything?! it looks like its meant to list the "hits" but actually gives a blank page, e.g. https://paperbuzz.org/details/10.5524/100001 CiteAs is just a tool to provide the citation of something in a variety of formats, so not what we need for this ticket.

only1chunts commented 4 years ago

There is now a documented guide to becoming COUNTER compliant: https://www.projectcounter.org/code-practice-research-data/

ScottBGI commented 4 years ago

GBIF have now built a citation widget if we have data also in GBIF (one for the future if we manage to integrate) and could also potentially be adapted into a tool for our data (I assume its DataCite - event data) https://www.gbif.org/article/1E6v02SFQyhupvB7JqDXPN/citation-widget

only1chunts commented 3 years ago

As an author /submitter I want to be able to see the usage stats of my dataset (how many views, downloads, tweets, citations etc..) So that I can show my employer/funder that the work was worthwhile

As a GigaDB admin I want to be able to demonstrate the advantages to open data sharing so that we can encourage more people to do it

As a funder I want to be able to see data that I have funded that has been published and shared so that I can verify that the funding was put to good use

only1chunts commented 3 years ago

Cobalt Metrics might be another option to look at for how to get hold of metrics for our datasets: https://cobaltmetrics.com/digests Its a paid-for service, but if you believe their website they are better and cheaper than Altmetrics.

only1chunts commented 3 years ago

also worth throwing in the mix, plumX metrics: https://plumanalytics.com/learn/about-metrics/ NB - This is part of the Elsevier group of companies.

only1chunts commented 1 year ago

DataCite are looking for Beta testers of their datacite usage tracker tool, it has two parts:

  1. view counter - insert a bit of java-script into each landing page and it reports to datacite who track the number of views of a dataset.
  2. download counter - Simiarly this is a java script that gets embeded into the download confirmation page to track the number of downloads. This might be a problem for us, as we do not have a single "download this dataset" button, so we need to workout how to report individual file downloads instead of whole dataset downloads?! FYI - the COUNTER code of practice does include the requirement for a "download confirmation" page upon successful download of a dataset.