Open chryslovelace opened 1 year ago
Hello and sorry for the delay of our reaction... Those lines are indeed ignored by ezpaarse by default. You could setup ezpaarse globally not to ignore 302 status lines but it is a global parameter, see: https://ezpaarse-project.github.io/ezpaarse/configuration/parametres.html#ezpaarse-filter-status We are thinking about allowing that feature on a parser basis (instead of a global parameter) to keep the processing load as low as possible (in a typical log file, we filter out 90-95% of the log lines)
As for you second question of combining multiple lines to make a determination of an access event, which is obviously linked to the 302 situation, we are also thinking on either:
NB: The only usecase where we keep a memory of a previous access event is for the counter deduplication algorithm where we filter access events if the same resources is accessed by the same user-session or user-id in a short timespan (10 to 30 seconds, depending on the resource format).
We have an issue where access events for Proquest are not being registered due to their use of redirects. Here are some snippets of sessions that demonstrate this issue:
The url in each of the first lines here includes the document id, and
proquest/parser.js
seems like it should be picking up this url format, but they are presumably being ignored due to the 302 redirect and/or empty content. The actual content is delivered in the third request, but the id is no longer present in the url to be extracted, so the access event can't be properly registered.In some previous correspondence our organization had asked whether multiple lines could be combined to make a determination of an access event and the response was that it was not possible. Is this still the case given this issue? If not, is there a way that the initial request here can count as the access event, so those identifiers can be extracted?