GlobalDataverseCommunityConsortium / dataverse-previewers

A collection of Datafile Previewers that can be configured to work with Dataverse
MIT License
13 stars 38 forks source link

Request for Help with "Make Data Count" #62

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hi

I have setup dataverse 5.5 using this docker-compose file https://github.com/IQSS/dataverse-docker/blob/master/docker-compose.yml

However, I can't get the Dataset Metrics to work. It shows 0 today even though I have viewed & downloaded the dataset a few times yesterday before and after I ran the steps below to enable the dataset metrics. image

I have followed these steps to enable the dataset metrics on yesterday 2-nov-2021: curl -X PUT -d '/usr/local/payara5/glassfish/domains/domain1/logs/mdc' http://localhost:8080/api/admin/settings/:MDCLogPath curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:DisplayMDCMetrics

cd /usr/local wget https://github.com/CDLUC3/counter-processor/archive/v0.0.1.tar.gz tar xvfz v0.0.1.tar.gz download GeoLite2-Country_20211026.tar.gz from maxmind.com tar xvfz GeoLite2-Country20211026.tar.gz cp GeoLite2-Country*/GeoLite2-Country.mmdb maxmind_geoip

wget https://guides.dataverse.org/en/latest/_downloads/f99910a3cc45e4f68cc047f7c033c7f0/counter-processor-config.yaml set year_month to 2021-11 in counter-processor-config.yaml

touch /usr/local/payara5/glassfish/domains/domain1/logs/mdc/counter_2021-11-01.log

CONFIG_FILE=counter-processor-config.yaml python3 main.py

curl -X POST "http://localhost:8080/api/admin/makeDataCount/addUsageMetricsFromSushiReport?reportOnDisk=/tmp/make-data-count-report.json"

setup cron to run counter_daily.sh crontab -e 01 00 * sh /usr/local/counter-processor-0.0.1/counter_daily.sh

Can anyone advise if I have done any step wrong or miss out anything?

Thank you very much.

qqmyers commented 2 years ago

Has this resolved itself? With the MDC metrics/counter processor, running the cron job will only process activity through the prior day, which would be consistent with you not seeing any updates to the MDC metrics display on the same day you were accessing the dataset. If you're still not seeing metrics results, we can start walking through the sequence of events and output files to see where things have gone wrong.

ghost commented 2 years ago

no, it's not resolved yet, the data metrics are still showing 0. in addition, there's also no log files in /usr/local/payara5/glassfish/domains/domain1/logs/mdc/ produced by the cron job. the only file is counter_2021-11-01.log which was manually created by me.

for the sequence of events, I only ran the steps mentioned in my previous post. I didn't encountered any error message when running these steps. for the output files, which ones are you referring to?

qqmyers commented 2 years ago

If the :MDCLogPath is set, Dataverse should write into that directory any time views/downloads occur. That's independent of whether you have counter installed/have set up the cron job, etc. If you don't see log files there, check the main server.log file for errors. A typo in the :MDCLogPath, permission issues on the directory you've specified, etc. might all prevent Dataverse from writing those logs. In normal operation, there will only be log files on days when there is activity on the server though, so dev servers in particular may not have a new file for each day.

With the logs in place, the cron job will run counter processor which, if it is configured correctly, will read those logs, create it's own output files, and update it's local db. In a second step, the daily cron job then sends that output file back to Dataverse via an API call, which is what results in the internal metrics table being populated and, from that, the metrics being displayed.

If needed, I can provide more details about these later steps, but it sounds like you're first issue is just in getting the original logs written by Dataverse and, hopefully, if you resolve that, the rest will already be working.

ghost commented 2 years ago

I found the server.log in /opt/payara/appserver/glassfish/domains/domain1/logs ( i didn't find any other server.log in the dataverse container). But there's no additional log when i view the page for the dataset or download the dataset. I tried increasing the logging with ./asadmin set-log-levels edu.harvard.iq.dataverse.api.Datasets=FINE . Again, there's no new logs added when i view the page for the dataset or download the dataset.

Here a snapshot of the last few lines of the server.log. am i looking at the correct log file? image

The permission for server.log is : -rw-r--r-- 1 payara payara 4093 Oct 16 03:14 server.log

even after i change server.log with chmod 777, I still don't see any new log when i view/downloaded the dataset.

I also executed this command again: curl -X PUT -d '/usr/local/payara5/glassfish/domains/domain1/logs/mdc' http://localhost:8080/api/admin/settings/:MDCLogPath and change the mdc folder with chmod 777 -R But i still don't see any log in mdc folder or additional log in the server.log.

not sure what else to check from here. would appreciate further advice. thank you very much!

qqmyers commented 2 years ago

I can't think of much either except carefully checking for typos. Once you set :MDCLogPath , you should be able to retrieve it's value: curl http://localhost:8080/api/admin/settings/:MDCLogPath You should be able, as the user running Dataverse, cd to the returned directory name and be able to create a file there. Internally, methods that represent a view/download of a dataset all call https://github.com/IQSS/dataverse/blob/b117a31076d5a69cb8d447821c3222c39923851c/src/main/java/edu/harvard/iq/dataverse/makedatacount/MakeDataCountLoggingServiceBean.java#L40 which, if the MDCLogPath is set, calls the logging utility to write to a file named according to "counter_"+new SimpleDateFormat("yyyy-MM-dd").format(new Timestamp(new Date().getTime()))+".log", e.g. counter_2021-11-09.log. If you follow the code, you'll see that the LoggingUtil class (at https://github.com/IQSS/dataverse/blob/b117a31076d5a69cb8d447821c3222c39923851c/src/main/java/edu/harvard/iq/dataverse/batch/util/LoggingUtil.java#L88) is writing messages to server.log if/when there are exceptions in finding/creating the MDC directory and/or if there's a problem writing the file. Using tail -f server.log while you're trying to view/download a dataset may make it easier to find the relevant portion of the log file but if the MDCLogPath is set and the dir is OK and there are no errors in the log, I'm not sure what to suggest.

ghost commented 2 years ago

Using the user running dataverse, i'm able to create a file in the returned directory. But no log file is produced or errors appended in server.log when i view/download dataset. I have tried running the docker-compose file on another server and also faced same issue. nevertheless, thank you for your help. if anyone has faced this issue and found the cause, please let me know. thank you very much!

qqmyers commented 2 years ago

Hmm - I don't know enough about docker/the Dataverse docker setup to help but I could imagine a possibility that the mdc logs are being written (no error) and just no showing up in the same volume as server.log. I'm not sure how visible an MDC issue will be to those who do know, so I might suggest trying an issue in the repo for the docker setup you used focusing on the logs not appearing as/where expected.