Closed mreekie closed 10 months ago
Aim 5: Standardize usage metrics with the repository and across other repositories
The Make Data Count project has provided an opportunity for repositories to standardize the usage metrics (e.g., number of views, downloads, and citations). There are still many inconsistencies in the way each generalist repositories (and domain-specific repositories) count these metrics, making it difficult to assess and compare the popularity of various datasets and give proper credit to data authors. Based on the Make Data Count initiative, we propose to provide full metrics on dataset usage for the Harvard Dataverse repositories and coordinate the implementation of the metrics with other repositories (e.g., Dryad, with which we have already been having conversations on this area) so the values and assumptions are comparable.
Updates from Phil on slack
pdurbin
[41 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873545656879)
For the [Feb GREI monthly technical status report](https://docs.google.com/document/d/1vHHVz2Vlo2-2vGOHqRdboBhF6odeVvMI/edit?usp=sharing&ouid=117275479921759507378&rtpof=true&sd=true) I’m writing “We watched the video from Matt Buys about the new MDC Usage Tracker Service and summarized for the team” and this thread is that summary.
13 replies
pdurbin
[41 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873575913639?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
This is the repo on GitHub: https://github.com/datacite/datacite-tracker
pdurbin
[40 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873617811179?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
Here’s an overview. Note that as with our log processing implementation, SUSHI reports are still sent over the wire.
Screen Shot 2023-03-03 at 2.44.35 PM.png
Screen Shot 2023-03-03 at 2.44.35 PM.png
pdurbin
[38 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873772131449?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
The DataCite Usage Tracker presentation section of the [Feb 1 agenda](https://docs.google.com/document/d/1nBZWAZiJmo6d-A1Ol6u1sbjjicFszvvRb9amw8BP82M/edit?usp=sharing) has notes from the presentation. The relevant part of [the recording](https://drive.google.com/file/d/1kAdSHFaLq_Z4fWwmYlbfEsD-Il06X3j4/view?usp=share_link) is 10 minutes long, from ~8:00 to ~19:00. (It would be nice to make this 10 minutes public.)
pdurbin
[36 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873848580659?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
Here’s the official summary from the notes:
DataCite Usage Tracker presentation
Tracker overview: A hosted service, Javascript tracker, this is a tracker that you implement into your landing pages, doesn’t store PII, takes the usage info, generates SUSHI reports, aggregate usage data via DataCite API & interfaces.
What Do You Need to Do?
You contact Kelly at DataCite, get tracking ID, add the snippet into your landing page,
DataCite would like to work with some of you during this implementation.
If you have DublinCore or [Schema.org](http://schema.org/) in your landing page….No need to add the DOI name, the datacite snippet will grab the DOI automatically.
Accessing usage stats will be possible via API & Interfaces
DataCite doesn’t expect all repositories to implement right now in Feb.
pdurbin
[35 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873906215759?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
Also relevant to us (from Q&A):
Stan: If we have been collecting historical data is there an ability to load this data into this?
Answer: the Usage report API has this ability
pdurbin
[34 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873969808389?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
I got a pretty strong impression that all GREI repositories (any DataCite customer) is expected to move from the old log processing approach to this new browser-based Javascript approach.
pdurbin
[34 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677873998704639?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
They named Figshare, Zenodo, and Dryad as using the old approach but of course we do to (QDR and others use it).
pdurbin
[33 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677874066773959?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
As a GREI repository, we are being asked to reach our to DataCite (Matt Buys or Kelly Stathis) with a timeline of when we expect to implement the new tracker.
pdurbin
[32 minutes ago](https://iqss.slack.com/archives/C03R1E7T4KA/p1677874120826199?thread_ts=1677873545.656879&cid=C03R1E7T4KA)
The open metrics subcommittee hasn’t met in months but Matt is scheduling a meeting (doodle poll closes next week). I expect we’ll be asked at the meeting for a timeline.
This came up at the daily today. Jim/Phil discussed next steps. We don't have a formal issue on it yet, but they want to get together with datacite folks and get the conversation going with them. The people involved need to include Phil, Leonid and Jim on the dataverse side. Leonid is out with covid right now so Phil is going to just ping datacite and let them know we are interested in talking.
Yeah, I just sent @KellyStathis a DM on the GREI Slack. Let's wait for Leonid to get back and figure out internally a good date for us (I'm polling internally to see who wants to be there). Then we'll see what day works for Kelly and/or Matt.
Just a quick update that a few of us met with Kelly and afterward @qqmyers created a great doc to explain how the new tracker probably won't "just work" with Dataverse without some future development. I got approval from Matt and Jim to link to the doc during a GREI Open Metrcis subcommittee meeting so I imagine no one will mind if I link it here: https://docs.google.com/document/d/1cSgRVdOKkY6ouBZyqbDonZtFPNHjvBpu6DOyXHtnL8Y/edit?usp=sharing
March update:
(2.5.1) On March 22, we met with Kelly and Sarla from DataCite about the new Usage Tracker (notes) and are engaging in a more technical followup with their development team in a Google doc. We are concerned that the tracker won't "just work" without further development. We also started a thread on the tracker to get input from the community. On March 31 we participated in the GREI Metrics Sub-Committee Meeting (notes), where we discussed year 2 plans.
Update
Action
2024/01/03: Closing, work will be tracked here: https://github.com/IQSS/dataverse-pm/issues/118
NIH wants each repository to share dataset usage data for datasets that were funded by the NIH. The NIH wants to see how the datasets that they funded are being used and they want the reporting methodology and statistics to be done in in a standardized way across repositories.
The deliverable for this AIM is begin to provide the usage statistics for datasets funded by the NIH. The pre-requisites for this are the at we've agreed what we are collecting in a standard fashion with the other repositories and we have implemented the code to make the data collection happen.