Open SarahSidders opened 1 year ago
Hi, thanks for the issue and pull request. Could you explain how you want to use this so I can make sure I understand the intentions?
After a quick look at the pull request, I have a few questions.
Just curious: how is this different/better than the Tomcat request log which is also in a standard format?
A little background. we needed to see what had been downloaded going back as far as we could. our email setup hadn't been working due to some internal IT changes, therefore we fell back onto erddap's logging system, a grep did fetch the information required but only back about a year.
All of that work lead us to realising we needed a good
way to catch what was being requested. Currently we are investigating a way to duplicate traffic (i.e. monitor requests before they make it to erddap) - this is more of a challenge when we add in the https connections rather than http and the response tracking.
Given this is being driven from our remit to deliver data as a national data center (plus funders want to know), we thought that other institutes might want a simple
way to see what erddap was delivering out to their communities.
@BobSimons I see this as better, as it isn't dependent on tomcat's setup and without knowing every institutes setup it seems plausible that some places might have access to the erddap directories and not to tomcats logging.
@ChrisJohnNOAA These are all very good points and most of my answers resolve to because we need that info
, more details below:
filter
this type of logging within the setup.xmlAs stated this work is very much something we wanted to discuss with the community to see if it is of interest to anyone else, as we require this information for reporting purposes we will be continuing investigation around the traffic routing. If the idea in this issue is of use to the wider community then we'll swap to this way of collecting the information in the future.
Even if you only want successfully downloaded/accessed data, you probably want to include status codes besides 200. Specifically 206 (partial content) and redirects (300-308) might be useful. On a broader note, I'd be interested in feedback from others on if things like error requests would be useful here.
As for changes to the existing logging, my recommendation would be to have a setting that allows configuration of what logs should be generated. Most likely it should default to the current logging implementation, and have options for the new structured, and both together. Though I'd also appreciate feedback from others running ERDDAP servers on what they'd like to see here.
As for testing, my preference would be a JUnit style unit test, specifically testing the logging capabilities. However since I don't currently have an example JUnit test, I'd be happy with any test.
ERDDAP currently records requests in a plain text format. We would like to propose changes to be made so that ERDDAP records these details in a structured format that can be used by the ERDDAP community to more easily capture this data for reporting purposes.
We have created a branch that details what the changes could be to achieve this structure.
This is a proposed idea, and we're happy for any different approaches/changes that can be made to improve this for community use.
Please see the pull request: https://github.com/ERDDAP/erddap/pull/117