Record requests in a structured format

SarahSidders commented 1 year ago

ERDDAP currently records requests in a plain text format. We would like to propose changes to be made so that ERDDAP records these details in a structured format that can be used by the ERDDAP community to more easily capture this data for reporting purposes.

We have created a branch that details what the changes could be to achieve this structure.

• RequestDetails.java - This new class is used to create an object for the request data
• UsageMetrics.java - This new class is used to process the request details object into a json format
• Erddap.java - Updated this class to include the RequestDetails object and call the UsageMetrics sendUsageMetrics method

This is a proposed idea, and we're happy for any different approaches/changes that can be made to improve this for community use.

Please see the pull request: https://github.com/ERDDAP/erddap/pull/117

ChrisJohnNOAA commented 1 year ago

Hi, thanks for the issue and pull request. Could you explain how you want to use this so I can make sure I understand the intentions?

After a quick look at the pull request, I have a few questions.

Why are you only logging status===200 requests?
Should this new log be an option that can be enabled by a flag?
Is there a reason to log both the old and new logs?
Any reason "loggedInAs" is not included in the new log?
Especially since a structured log format implies there will be dependencies on these logs, can you add tests to verify the logs are in the proper format?

BobSimons commented 1 year ago

Just curious: how is this different/better than the Tomcat request log which is also in a standard format?

thogar-computer commented 1 year ago

A little background. we needed to see what had been downloaded going back as far as we could. our email setup hadn't been working due to some internal IT changes, therefore we fell back onto erddap's logging system, a grep did fetch the information required but only back about a year.

All of that work lead us to realising we needed a good way to catch what was being requested. Currently we are investigating a way to duplicate traffic (i.e. monitor requests before they make it to erddap) - this is more of a challenge when we add in the https connections rather than http and the response tracking.

Given this is being driven from our remit to deliver data as a national data center (plus funders want to know), we thought that other institutes might want a simple way to see what erddap was delivering out to their communities.

@BobSimons I see this as better, as it isn't dependent on tomcat's setup and without knowing every institutes setup it seems plausible that some places might have access to the erddap directories and not to tomcats logging.

@ChrisJohnNOAA These are all very good points and most of my answers resolve to because we need that info, more details below:

Agreed this log should include all the requests and it is up to other systems to visualise what might be required i. we are using this system to track and report on downloaded/accessed data requests hence only 200s
happy to default to erddaps way of doing this i. i would be interested in seeing if other users would want a way to filter this type of logging within the setup.xml
interesting, we don't want to remove logging from other places as we felt that other institutes might have designed ways/systems to pick this info out of the current logs.
no, that is a good point and one we can look to include
we can look to add tests in, given the current work underway around the way tests (and maven) are setup how would you like new tests to be added?

As stated this work is very much something we wanted to discuss with the community to see if it is of interest to anyone else, as we require this information for reporting purposes we will be continuing investigation around the traffic routing. If the idea in this issue is of use to the wider community then we'll swap to this way of collecting the information in the future.

ChrisJohnNOAA commented 1 year ago

Even if you only want successfully downloaded/accessed data, you probably want to include status codes besides 200. Specifically 206 (partial content) and redirects (300-308) might be useful. On a broader note, I'd be interested in feedback from others on if things like error requests would be useful here.

As for changes to the existing logging, my recommendation would be to have a setting that allows configuration of what logs should be generated. Most likely it should default to the current logging implementation, and have options for the new structured, and both together. Though I'd also appreciate feedback from others running ERDDAP servers on what they'd like to see here.

As for testing, my preference would be a JUnit style unit test, specifically testing the logging capabilities. However since I don't currently have an example JUnit test, I'd be happy with any test.

ERDDAP / erddap

Record requests in a structured format #118