allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.51k stars 1.11k forks source link

Help with custom log file format #1756

Open kaplandani opened 4 years ago

kaplandani commented 4 years ago

Here is how my log line starts:

[2020-05-05T14:51:03-04:00] || STATUS: 200 || HOST: (omitting the rest)

Here is the settings I use: time-format %T date-format %d/%b/%Y log_format [%^8601] || STATUS: %s || HOST: %v

This is the output from the command:

Parsed 1 linesproducing the following errors:

Token '0]' doesn't match specifier '%s'

Format Errors - Verify your log/date/time format

What am I doing wrong ?

freephile commented 4 years ago

%^8601 does not match your sample log line. %^ "means ignore this field" then you have a literal '8601'.

kaplandani commented 4 years ago

this was automatically produce by the script. so - how do I set the 8601 date-time format ?

allinurl commented 4 years ago

@kaplandani GoAccess requires the following fields:

a valid IPv4/6 %h
a valid date %d
the request %r
freephile commented 4 years ago

For date-format, I believe you should have %Y-%m-%d%z

What does a full log line look like?

kaplandani commented 4 years ago

this is the log configuration - I've sent it to the script and got what I've posted above:

'[$time_iso8601] ||  STATUS: $status || HOST: $host ||  CONN#: $connection || X-Forw: $http_x_forwarded_for || Proxy: $proxy_add_x_forwarded_for ||  CF: $http_cf_connecting_ip || $remote_addr --> $server_addr || REQ_TIME: $request_time || UPSTR_RESP_TIME: $upstream_response_time ||  USER_AGENT: "$http_user_agent" ||  URI: $request ||  POST: $request_body ||  RESPONSE: "$upstream_cookie_name" || '
allinurl commented 4 years ago

[2020-05-05T14:51:03-04:00] || STATUS: 200 || HOST: (omitting the rest)

Please post the full line and ideally multiple lines straight from the log. Thanks

gps3dx commented 4 years ago

Another request with "Help with custom log file format".

some log lines examples:

10.180.0.18 "-" - - [20/May/2020:12:03:35 +0300] GET /mng/images/icon_arrow_sort.png HTTP/1.1 304 - 3 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.003 https-jsse-nio-8443-exec-5 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"
10.180.0.18 "-" - - [20/May/2020:12:03:35 +0300] GET /mng/images/icon_arrow_sort_up.png HTTP/1.1 304 - 3 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.003 https-jsse-nio-8443-exec-4 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"
10.180.0.18 "-" - - [20/May/2020:12:03:35 +0300] GET /mng/fonts/glyphicons-halflings-regular.woff2 HTTP/1.1 304 - 4 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.004 https-jsse-nio-8443-exec-2 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"
10.180.0.18 "-" - - [20/May/2020:12:03:55 +0300] POST /mng/action/pageAction.page_xml.page_quartz_job_details.xml.do HTTP/1.1 200 60569 283 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.283 https-jsse-nio-8443-exec-9 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"

My tomcat conf/server.xml log "pattern" value is: "%h "%{X-Forwarded-For}i" %l %u %t %r %s %b %D %S %T %I %{institute}c "%{User-Agent}i""

I call goaccess with: goaccess --log-format='%h "^" %^[%d:%t %^] %m %r %^ %s %b %D %R %T "u"' --date-format=%d/%B/%Y --time-format=%H:%M:%S but it is only partially decoded:

Thanks for the help and @allinurl for the wonderful app !

allinurl commented 4 years ago

@gps3dx

goaccess access.log --log-format='%h %^[%d:%t %^] %m %U %H %s %b %^ %^ %T %v %^ "%u"' --date-format=%d/%b/%Y --time-format=%T
gps3dx commented 4 years ago

@allinurl - thanks. With the new log-format I have an issue, since my logs contains some problematic lines that are somewhat different from the rest:

192.1.1.1 "-" - - [13/May/2020:00:00:03 +0300] GET / 200 230 0 - 0.000 http-nio-1801-exec-244 - "-" 127.0.0.1 "192.1.1.1" - - [13/May/2020:00:00:03 +0300] GET / HTTP/1.1 200 230 0 - 0.000 http-nio-1801-exec-18 - "-" 127.0.0.1 "192.1.1.1" - - [13/May/2020:00:00:03 +0300] GET / HTTP/1.1 200 230 1 - 0.001 http-nio-1801-exec-232 - "-" 192.1.1.1 "-" - - [13/May/2020:00:00:04 +0300] GET / 200 230 0 - 0.000 http-nio-1801-exec-193 - "-" 192.1.1.1 "-" - - [13/May/2020:00:00:08 +0300] GET / 200 230 0 - 0.000 http-nio-1801-exec-283 - "-" 10.180.0.18 "-" - - [20/May/2020:12:03:35 +0300] GET /mng/images/icon_arrow_sort.png HTTP/1.1 304 - 3 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.003 https-jsse-nio-8443-exec-5 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36" 10.180.0.18 "-" - - [20/May/2020:12:03:35 +0300] GET /mng/images/icon_arrow_sort_up.png HTTP/1.1 304 - 3 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.003 https-jsse-nio-8443-exec-4 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36" 10.180.0.18 "-" - - [20/May/2020:12:03:35 +0300] GET /mng/fonts/glyphicons-halflings-regular.woff2 HTTP/1.1 304 - 4 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.004 https-jsse-nio-8443-exec-2 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36" 10.180.0.18 "-" - - [20/May/2020:12:03:55 +0300] POST /mng/action/pageAction.page_xml.page_quartz_job_details.xml.do HTTP/1.1 200 60569 283 371B1F5C27BED6EEC4FCBC45BB22D08A.mysite.com:1801 0.283 https-jsse-nio-8443-exec-9 - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"

goaccess fails with the following message:

Parsed 1 lines producing the following errors: Token '' doesn't match specifier '%H' Format Errors - Verify your log/date/time format

Solution can be varied ( ofcourse is depended on goaccess capabilities )

  1. Can goaccess exclude problematic log lines ? Please note that I don't want to block one/many IPs, they even they appear as problematic, the same IPs do appear "normal" in the rest of the log.
  2. Can goaccess parse multiple log-format in a single run ?

UPDATE - found a workaround by doing regex replace with perl before sending the log to goaccess: perl -pe 's|(?<=GET /)\ \ (?=\d)| HTTP/1.1 |' <LOG_FILENAME> | goaccess

kaplandani commented 4 years ago

[2020-05-05T14:51:03-04:00] || STATUS: 200 || HOST: (omitting the rest)

Please post the full line and ideally multiple lines straight from the log. Thanks

Here are full 3 lines of the log:

[2020-05-26T07:43:03-04:00] || STATUS: 200 || CACHE: - || HOST: www.myhost.com || CONN#: 200116875 || X-Forw: 34.34.232.240 || Proxy: 34.34.232.240, 34.34.232.240 || CF: 34.34.232.240 || 34.34.232.240 --> 34.34.232.12 || SENT TO: 127.0.0.1:41800 || REQ_TIME: 0.050 || UPSTR_RESP_TIME: 0.052 || USER_AGENT: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" || URI: POST /iamges/view/get HTTP/1.1 || POST: ignore=575794
[2020-05-26T07:43:03-04:00] || STATUS: 200 || CACHE: - || HOST: www.myhost.com || CONN#: 200113789 || X-Forw: 34.34.232.248 || Proxy: 34.34.232.248, 34.34.232.248 || CF: 34.34.232.248 || 34.34.232.248 --> 34.34.232.12 || SENT TO: 127.0.0.1:41800 || REQ_TIME: 0.042 || UPSTR_RESP_TIME: 0.044 || USER_AGENT: "Mozilla/5.0 (iPhone; CPU iPhone OS 13_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Mobile/15E148 Safari/604.1" || URI: GET /images/view/get?a=3 HTTP/1.1 || POST: -
[2020-05-26T07:43:03-04:00] || STATUS: 200 || CACHE: - || HOST: www.myhost.com || CONN#: 200117549 || X-Forw: 34.34.232.250 || Proxy: 34.34.232.250, 34.34.232.250 || CF: 34.34.232.250 || 34.34.232.250 --> 34.34.232.12 || SENT TO: 127.0.0.1:41800 || REQ_TIME: 0.011 || UPSTR_RESP_TIME: 0.012 || USER_AGENT: "Mozilla/5.0 (iPhone; CPU iPhone OS 13_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Mobile/15E148 Safari/604.1" || URI: POST /images/views/put HTTP/1.1 || POST: []
gps3dx commented 4 years ago

my tomcat log format included - "%h "%{X-Forwarded-For}i" as I mentioned above. I want goaccess to show statistics based on the "true" requester - but one time it is the %h ( if x-forward is "-", i.e empty ) and another %h is the forwarder and X-forward shows the real first requester.

How should my log-format be seen for such case ? is it %h "%R" ...?

for example my log is ( like above )

127.0.0.1 "192.1.1.1" - - [13/May/2020:00:00:03 +0300] GET / HTTP/1.1 200 230 0 - 0.000 http-nio-1801-exec-18 - "-"
192.1.1.1 "-" - - [13/May/2020:00:00:03 +0300] GET / HTTP/1.1 200 230 0 - 0.000 http-nio-1801-exec-18 - "-"
allinurl commented 4 years ago

@gps3dx so which IP are you trying to report?

gps3dx commented 4 years ago

@gps3dx so which IP are you trying to report?

  1. I think i'm trying to get the 2nd field as a "Referring Site" - but when I use "%R" - goaccess screen of referrers is empty.
  2. another case - if I want to use the 2nd field as 'host' only when I see 127.0.01 in the first field - can goaccess deal with it ? like what a "/etc/hosts" do ?