IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
876 stars 484 forks source link

Tweaks and bug fixes for Make Data Count #6979

Closed landreev closed 3 weeks ago

landreev commented 4 years ago

(working title; could be changed later). We have started collecting MDC logs in our production. Have since discovered that some entries in them are breaking Counter Processor.

I may be adding more as things are being discovered.

landreev commented 4 years ago

Also, I believe that all this MDC-related development effort should be made on a strict need-to-do basis. Specifically with the problems above - yes, we can address it for these datasets with something uncommon/exotic in the metadata, with some simple extra logic. But I would like to know, do we even need to bother logging accurate information in these "author" and "title" fields? Are they being used for any practical purpose whatsoever? The values in these fields are definitely discarded when the resulting Sushi report is imported back into Dataverse. (and what would be the point anyway - we do know the titles and authors of our datasets!). Are these fields used when the reports are sent to Datacite? - Again, I don't see the point - seeing how Datacite already has this information for the datasets registered with them. So I'm wondering if we might as well just populate these fields with "-"s or placeholder values - ?

landreev commented 4 years ago

Edit: the issue below is already addressed in #6629; thanks @pdurbin for linking it.

Another small-ish thing, about installing Counter Processor. Our instructions (and even the instructions on the CP github site) are still telling people to download the geolocation bundle from maxmind like this: wget https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz - but it's no longer available. It appears that Maxmind has changed the licensing and/or access terms: https://forum.matomo.org/t/maxmind-is-changing-access-to-free-geolite2-databases/35439 I didn't fully get how one is supposed to install it now - do we have to pay for it? or can we open an account and sign some educational use statement? etc. I was able to install it using the bundle left on our test servers from some earlier experiments. But this should be addressed going forward, for other installations. And for our own use, if we want to do it by the book/need to upgrade etc.

pdurbin commented 4 years ago

It appears that Maxmind has changed the licensing and/or access terms

I haven't investigated this but I thought I'd at least point to issues that @qqmyers has opened:

qqmyers commented 4 years ago

W.r.t. - nice message for no stats: there's the :DisplayMDCMetrics setting to use: http://guides.dataverse.org/en/latest/installation/config.html#id218

landreev commented 4 years ago

@qqmyers

W.r.t. - nice message for no stats: there's the :DisplayMDCMetrics setting to use: http://guides.dataverse.org/en/latest/installation/config.html#id218

I thought the setting was for the display on pages only, no? It must be - it is currently set to false in our prod., but I can get to the APIs and get some stats.

So yes, I was talking about the APIs specifically. And that was based on inquiries from real users. Apparently we had users who read the section of the guide about MDC support and tried to use the API to get the metrics for their datasets. And then contacted support asking why they were getting zeroes.

qqmyers commented 4 years ago

Got it - you're right. I guess if that's addressed (APIs), it could be one UseDatasetMetrics setting, or both should key off of the metrics not being empty? (Thinking there's no need to handle the UI and API different from each other.)

landreev commented 4 years ago

@qqmyers Agree that there's no need to handle the UI and API differently... in theory? - Because ours is just such a case right now: we are not ready to start showing the numbers on pages. But need to have the APIs open and report something, to a collab. dev. project. (But, yes - this is asking for more user confusion; since the numbers shown are half-baked and incomplete at the moment). Well, we'll try to address this by trying to fully populate the metrics asap. And then it would not be a problem. So going forward, we should be able to adopt this "one setting for both things" idea.

I really like the other idea, to have both the API and UI perform the same "empty table" check.

cmbz commented 3 weeks ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.