Closed landreev closed 3 weeks ago
Also, I believe that all this MDC-related development effort should be made on a strict need-to-do basis. Specifically with the problems above - yes, we can address it for these datasets with something uncommon/exotic in the metadata, with some simple extra logic. But I would like to know, do we even need to bother logging accurate information in these "author" and "title" fields? Are they being used for any practical purpose whatsoever? The values in these fields are definitely discarded when the resulting Sushi report is imported back into Dataverse. (and what would be the point anyway - we do know the titles and authors of our datasets!). Are these fields used when the reports are sent to Datacite? - Again, I don't see the point - seeing how Datacite already has this information for the datasets registered with them. So I'm wondering if we might as well just populate these fields with "-"s or placeholder values - ?
Edit: the issue below is already addressed in #6629; thanks @pdurbin for linking it.
Another small-ish thing, about installing Counter Processor. Our instructions (and even the instructions on the CP github site) are still telling people to download the geolocation bundle from maxmind like this: wget https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz
- but it's no longer available.
It appears that Maxmind has changed the licensing and/or access terms: https://forum.matomo.org/t/maxmind-is-changing-access-to-free-geolite2-databases/35439
I didn't fully get how one is supposed to install it now - do we have to pay for it? or can we open an account and sign some educational use statement? etc. I was able to install it using the bundle left on our test servers from some earlier experiments. But this should be addressed going forward, for other installations. And for our own use, if we want to do it by the book/need to upgrade etc.
It appears that Maxmind has changed the licensing and/or access terms
I haven't investigated this but I thought I'd at least point to issues that @qqmyers has opened:
W.r.t. - nice message for no stats: there's the :DisplayMDCMetrics setting to use: http://guides.dataverse.org/en/latest/installation/config.html#id218
@qqmyers
W.r.t. - nice message for no stats: there's the :DisplayMDCMetrics setting to use: http://guides.dataverse.org/en/latest/installation/config.html#id218
I thought the setting was for the display on pages only, no? It must be - it is currently set to false in our prod., but I can get to the APIs and get some stats.
So yes, I was talking about the APIs specifically. And that was based on inquiries from real users. Apparently we had users who read the section of the guide about MDC support and tried to use the API to get the metrics for their datasets. And then contacted support asking why they were getting zeroes.
Got it - you're right. I guess if that's addressed (APIs), it could be one UseDatasetMetrics setting, or both should key off of the metrics not being empty? (Thinking there's no need to handle the UI and API different from each other.)
@qqmyers Agree that there's no need to handle the UI and API differently... in theory? - Because ours is just such a case right now: we are not ready to start showing the numbers on pages. But need to have the APIs open and report something, to a collab. dev. project. (But, yes - this is asking for more user confusion; since the numbers shown are half-baked and incomplete at the moment). Well, we'll try to address this by trying to fully populate the metrics asap. And then it would not be a problem. So going forward, we should be able to adopt this "one setting for both things" idea.
I really like the other idea, to have both the API and UI perform the same "empty table" check.
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.
If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.
(working title; could be changed later). We have started collecting MDC logs in our production. Have since discovered that some entries in them are breaking Counter Processor.
Some counter_*.log entries have the
authors
field empty. Example:2020-06-05T00:48:17-0400 10.137.170.203 2cee68ddfdc78885d0bb9069876d - :guest /dataset.xhtml?persistentId=doi:10.7910/DVN/05BMJY doi:10.7910/DVN/05BMJY - - Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36 Mexico, Evaluation of PROGRESA grid tbd 2012-07-09T00:00:00Z 1 - /dataset.xhtml?persistentId=doi:10.7910/DVN/05BMJY 2012
("authors" is the 14th tab-delimited field; should appear after that "tbd" in the example above). It looks like it happens for older datasets, that legitimately don't have authors specified. The problem with the entry above is that it's literally an empty space; instead of a "-" to indicate a missing valueThe following entry bombs (and stops a Counter Processor run) with the following exception:
...
File ".../counter-processor-0.0.1/counter-env/lib/python3.6/site-packages/peewee.py", line 3593, in db_value
return value if value is None else self.coerce(value) ValueError: invalid literal for int() with base 10: ' Digital Map Database of China'
Below is what appears to be the offending entry. It obviously has some UTF8 in it; but the problem with it appears to be not the unicode characters, but the fact that there are extra TABs in the line - on account of there being TAB characters in the title?2020-06-05T04:35:32-0400 10.137.169.47 39a8b5540ecc16e4394b33c4cf13 - :guest /dataset.xhtml?persistentId=doi:10.7910/DVN/J1UD6S doi:10.7910/DVN/J1UD6S - - Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0 Main Cities grid tbd 00_metadata 00_metadata 100% 10 Digital Map Database of China 已启用屏幕阅读器支持。 Digital Map Database of China 2020-02-15T21:35:14Z 1 - /dataset.xhtml?persistentId=doi:10.7910/DVN/J1UD6S 2020
So the fix would be to replace the TABs with spaces.Not a bug, but a simple feature request - on a Dataverse that hasn't started importing any Sushi data, maybe the MDC APIs should be displaying some user-friendly message, instead of zeros? - Just that "This Dataverse installation has not yet started importing MDC metrics".
I may be adding more as things are being discovered.