engineersamuel / logreaper

https://access.redhat.com/labs/logreaper/
27 stars 8 forks source link

Add support for custom logging format of access logging #3

Open msfm opened 8 years ago

msfm commented 8 years ago

It looks logreaper is unable to parse non-standard log format of access logging. As far as I checked, common (%h %l %u %t "%r" %s %b) and combined (%h %l %u %t "%r" %s %b "%{Referer}i" "%{User-Agent}i") works fine. But it fails to parse when I add response time (%T or %D) to logging format.

We often add custom information like response time (%T or %D), sessionid (%{JSESSIONID}C in httpd and %S in JBoss), thread name (%I in JBoss) and process id (%P in httpd) in order to investigate various issues.

It may not be easy to implement but it will be quiet useful if we have a feature to specify custom logging format of parser.

ryran commented 8 years ago

Indeed! Seconded! I can only imagine how complicated it would be, but ignoring all that for a moment, I feel the ideal case would be to paste in the format (e.g., what is in httpd's CustomLog directive) and then logreaper would just handle it.

engineersamuel commented 8 years ago

@msfm If you have one or more custom formats that you commonly use C&P them here with 5-10 lines for each format. Adding more regex formats isn't too tricky, and once they are in they are in. If there are just a handful of common custom formats you use, I think we can make this happen pretty quickly.

lywang-rh commented 6 years ago

Vote for this enhancement !

I was looking for some tools like this: https://www.apacheviewer.com/ I'm looking for something that can be fed in a log file along with custom LogFormat (the particular line from httpd config file) then give me converted data then I can sort by some specific column, or give conditions to filter useful data, basically like @ryran said.

We often add custom information like response time (%T or %D), sessionid (%{JSESSIONID}C in httpd and %S in JBoss), thread name (%I in JBoss) and process id (%P in httpd) in order to investigate various issues. As @msfm said, very true, once these information can be correctly converted, next step would be doing sorting/filtering on these logs based on particular column(s) , value(s).

The challenges here: 1) Delimiters: different people use access log in different ways and require different data. When using custom format there can also be different delimiters or syntax used/mixed together, blank space, single/double quotes, dash, even comma etc. Unfortunately these syntax are well mixed used and also mix together with the value/information getting printed. Except for the regular expression, not sure if we nowadays have any modern tools that can parse this in a smart way, for example, smart enough that it can tell empty spaces in "04 June 2018" should not be treated as delimiter, but if you say "52 June 2018" then 52 probably belongs to the last column and here only month/year are printed - just an example here, nobody would print such in access log. (This sounds like AI powered log parser lol)

2) Size of the log: as a support engineer, what I often deal with is, a ticket reporting "intermittent slow responses from httpd since last Friday" then throw over a 2GB+ sized access log file with a few dozen of million of lines in it. Though we can write up some script (some experts do know how to write complicated script to process these, though I'm not the person) and use some tools like awk to filter the data and process the log, or use some semi-manual way , or pure manual way... , it's very difficult to test/verify that these script/tools/manual ways always reliable for the reason 1) above, it may come to an inaccurate or totally wrong conclusion......

engineersamuel commented 6 years ago

In terms of parsing new logs or different formats I took a regex approach, so I'll gladly accept PRs for new regexes that parse the logs you are looking for. It should just be a matter of C&P one of the existing regexes for the file type in question, and tweaking it in www.regex101.com then adding it to public/formats/<name>.yml

Brijmohansingh commented 3 years ago

Hi -- i typically use the below pattern -- Is it possible to set the custom format so that the file gets identified by the tool?

Brijmohansingh commented 3 years ago

"ConversionPattern" value="| %d{yyyy-MM-dd H:mm:ss:SSS z} | %-5p | %-t | %c{2} | %X{dmDiagnosticContextProvidersHolder} | %m%n"