CDLUC3 / ezid

CDLUC3 ezid
MIT License
11 stars 4 forks source link

Investigate API logging with Matomo #548

Closed sfisher closed 7 months ago

sfisher commented 9 months ago

I found problems like this in the EZID logs:

2024-01-19 16:06:48,994     INFO ce6d5ccab72711eea31d06a8887cec41 BEGIN getMetadata ark:/88122/kgjw0052 anonymous anonymous anonymous anonymous False
2024-01-19 16:06:49,047    DEBUG Checking if user can view identifier. user="<class 'ezidapp.models.user.AnonymousUser'>" identifier="Identifier(pk=21302993, id=ark:/88122/kgjw0052, isArk=True, isDOI=False, isDataCite=False, isCrossref=False, target=https://www.industrydocuments.ucsf.edu/docs/kgjw0052, ownerId=432)"
2024-01-19 16:06:49,047    DEBUG is_authorized="True"
2024-01-19 16:06:49,075     INFO ce6d5ccab72711eea31d06a8887cec41 END SUCCESS
2024-01-19 16:06:55,086  WARNING cannot send google analytic tracking post: [Errno 111] Connection refused

I was finally able to dig out the information it's sending from the middleware by hacking some code into the library in the .pyenv environment. It seems like these requests work when I use curl. IDK what gives.

Examples it creates:

2024-01-22 16:24:26,871     INFO sending tracking request: https://matomo.cdlib.org//matomo.php?&apiv=1&idsite=37&rec=1&rand=1925231925&_id=8be63d1fe534d348&urlref=&url=http%3A%2F%2Fezid-dev.cdlib.org%2Fid%2Fark%253A%2F88122%2Fkgjw0052&token_auth=<omitted>&cip=128.48.67.17
2024-01-22 16:24:26,872     INFO headers: {'User-Agent': 'curl/8.4.0', 'Accept-Language': 'en'}

It turns out that this library uses a package called Celery which is required to be installed and running for it to work. This was not documented at all on the github page for it and I discovered it by reading code and trying to figure out what was wrong. I'll add another task for the Celery setup which is its own major task.