Closed pdurbin closed 1 year ago
Linking the issue IQSS/dataverse#9025 in the main project - a lot of the more recent discussion concerning this issue was happening there. As we revisit it during this spike, let's make sure to take any potentially useful information there into consideration.
I'm listing some of the collections in the Harvard Dataverse whose admins either rely on download counts now or have told us that they are very interested in being able to rely on them, such as for measuring the impact of their data and data sharing efforts. We/I can talk to the admins of these collections so that how we implement Make Data Count in Harvard Dataverse is informed by a better understanding of needs of users in the Harvard Dataverse:
It does sound to me like it has been established, that there are local users/collections who value their existing download counts. While it may have some value to further investigate their needs, I'm not sure if it's really necessary for the purposes of deciding how to proceed, with the dev. plan. (We already know we can't afford to drop the existing counts). It does sound like we have a degree of consensus that we want to implement what we've been referring to as the "QDR solution" - an option to display both the old-style counts, collected prior to the start of the MDC records, and the MDC metrics. This of course will still be optional; an installation will still have the options to stick with the "classic", non-MDC counts, or the MDC counts exclusively. This is also explained in some detail in the linked issue https://github.com/IQSS/dataverse/issues/9025. So let's "finalize" the plan by prioritizing either merging the existing IQSS/dataverse#6543, or if that one is too old, by pulling in the QDR changes via a new pr.
Discussed at standup. No objection to showing both counts.
The next step is probably to see if we can update and merge Jim's pull request:
We gave the PR a 10
For the sake of posterity, I should say that although I wrote that I'd like to learn from admins of collections in Harvard Dataverse who generally rely on metrics, I wasn't able to talk with them about this. One of the things I wanted to learn was if it was necessary to show both counts.
Unfortunately @landreev let me know that my comment was taken to mean that these users would like both counts to show, which may or may not be true, and supported the idea of a solution where both counts are shown.
Established Dataverse installations that have been operating for years might be reluctant to turn on Make Data Count (MDC) because the download counts will be reset to zero unless something is done to somehow copy the "classic" download counts into the new "datasetmetrics" database table that powers MDC download metrics. For example, Harvard Dataverse has over 10 million "classic" downloads:
Many Dataverse installations probably don't have all the Apache (or Glassfish or whatever) access logs from years ago lying around but the database table
filedownload
could be used as a source for timestamps of downloads from the "classic" system. After standup on 2020-02-05 @djbrooke @kcondon talked about this and I made the following diagram (best to open it in a new window since the text is so small):source for the image above: make-data-count.uml.txt
This is what I added to the diagram, which is based on http://guides.dataverse.org/en/4.19/admin/make-data-count.html#architecture
This is a bit hand wavy because we'd still use SUSHI as indicated by the
Log Processing
part of the diagram.Roughly, the idea is this:
See also pull request IQSS/dataverse#6543