allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.47k stars 1.11k forks source link

Please help with log-format for this logline from Squid #988

Closed zakhooi closed 3 years ago

zakhooi commented 6 years ago

Hi,

I'm using goaccess for squid's access.log but somehow I can't get it to work. Can someone please help with the proper log-format string for processing loglines such as these:

1515588009.814  12482 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 1729 CONNECT api.github.com:443 mauce HIER_DIRECT/192.30.253.117 -
1515588009.911  13628 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 37198 CONNECT github.com:443 mauce HIER_DIRECT/192.30.253.112 -
1515588011.062    210 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 1088 CONNECT graph.facebook.com:443 mauce HIER_DIRECT/31.13.91.2 -
1515588012.123  11703 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 4760 CONNECT docs.google.com:443 mauce HIER_DIRECT/172.217.17.142 -
1515588012.123  14879 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 145 CONNECT collector.githubapp.com:443 mauce HIER_DIRECT/52.87.124.197 -
1515588012.123  11701 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 229 CONNECT drive.google.com:443 mauce HIER_DIRECT/172.217.17.142 -
1515588012.125  12586 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 156 CONNECT github.com:443 mauce HIER_DIRECT/192.30.253.112 -
1515588012.125  12511 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 156 CONNECT github.com:443 mauce HIER_DIRECT/192.30.253.112 -
1515588022.625  10453 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 104707 CONNECT github.com:443 mauce HIER_DIRECT/192.30.253.112 -
1515588060.770  78446 [IPADDRESSMASKEDFORPRIVACY] TCP_TUNNEL/200 6127 CONNECT collector.githubapp.com:443 mauce HIER_DIRECT/52.87.124.197 -

I tried and tried but got stuck at this log-format:

log-format %d.%^ %^ %h %H/%s %^ %m %~%U %h %^/%s %^

but get the error:

Token '' doesn't match specifier '%h'

I'm sure %h is correct.

Can anyone help me out here please.

Thanks in advance,.

allinurl commented 6 years ago

This should do it:

goaccess access.log --log-format='%x.%^ %~%L [%h] %^/%s %b %m %U %^' --date-format=%s --time-format=%s --http-protocol=no --ignore-panel=BROWSERS --ignore-panel=OS --ignore-panel=REFERRING_SITES
zakhooi commented 6 years ago

Thanks but the I get this error: Parsed 10 lines producing the following errors:

Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL. Token for '%h' specifier is NULL.

Format Errors - Verify your log/date/time format

allinurl commented 6 years ago

For the log you posted above, it works. Please make sure that the log you are parsing looks exactly to the one you posted above. Also, try adding --no-global-config.

zakhooi commented 6 years ago

I did. The only change I made was masking the IP addresses. I've got the feeling in the entire log file some lines must have a different format. However, the error messages are not telling what the line number is where the error occurs. Any idea how to find the lines that are causing the errors?

allinurl commented 6 years ago

The first 10 lines. Feel free to attach the actual log or part of it and ensure the log is space delimited as in your sample lines.

zakhooi commented 6 years ago

These are the first 10 lines:

1515734740.494      1 [MASKEDIPADDRESS] TCP_DENIED/407 3922 CONNECT d.dropbox.com:443 - HIER_NONE/- text/html
1515734801.274  60719 [MASKEDIPADDRESS] TCP_TUNNEL/200 3790 CONNECT d.dropbox.com:443 erik HIER_DIRECT/162.125.34.6 -
1515734943.397      1 [MASKEDIPADDRESS] TCP_DENIED/407 3954 CONNECT client-cf.dropbox.com:443 - HIER_NONE/- text/html
1515734943.889 1257872 [MASKEDIPADDRESS] TCP_TUNNEL/200 9033 CONNECT bolt.dropbox.com:443 erik HIER_DIRECT/162.125.18.133 -
1515734943.951      1 [MASKEDIPADDRESS] TCP_DENIED/407 3934 CONNECT bolt.dropbox.com:443 - HIER_NONE/- text/html
1515735003.743  60277 [MASKEDIPADDRESS] TCP_TUNNEL/200 4059 CONNECT client-cf.dropbox.com:443 erik HIER_DIRECT/162.125.65.3 -
1515735033.399 5897754 [MASKEDIPADDRESS] TCP_TUNNEL/200 29063 CONNECT bolt.dropbox.com:443 erik HIER_DIRECT/162.125.18.133 -
1515735033.465      1 [MASKEDIPADDRESS] TCP_DENIED/407 3934 CONNECT bolt.dropbox.com:443 - HIER_NONE/- text/html
1515735036.690      1 [MASKEDIPADDRESS] TCP_DENIED/407 428 HEAD http://www.nu.nl/feeds/rss/algemeen.rss - HIER_NONE/- text/html
1515735036.738      2 [MASKEDIPADDRESS] TCP_DENIED/407 4353 GET http://www.tubantia.nl/cmlink/1.3294177 - HIER_NONE/- text/html
allinurl commented 6 years ago

Please add them as an attachment to this post straight from your log.

zakhooi commented 6 years ago

access.log Here you are.... Thanks in advance

allinurl commented 6 years ago

I see the issue now. The following partially works, but it fails to fully parse some of the records due to the period after the timestamp, the single space and %~.

goaccess s.log --log-format='%x.%^ %~ %L %h %^/%s %b %m %U %^' --date-format=%s --time-format=%s --http-protocol=no --ignore-panel=BROWSERS --ignore-panel=OS --ignore-panel=REFERRING_SITES

Let me take a look at this and I'll post back as soon as I have a fix.

cpt-kernel commented 5 years ago

Any update on this? Same issues

allinurl commented 5 years ago

@cpt-kernel still working on it. thanks for the reminder

hurgh commented 4 years ago

Hi All, I am running into the same issue here. My log format is the same as posted above (Squid 3.5.27), and when running with the suggested formate posted on 13th Jan 2018 by @allinurl I get the error: Token '' doesn't match specifier '%h'

I note that "SQUID" is one of the supported log formats, but when running with just --log-format=SQUID I get a similar error:

Token '/<URL>' dosen't match specifier '%x'

where is just the first part of the URL: for http://www.example.com the error would be '/www'

Thanks

h4tt0r1 commented 4 years ago

Hi there, any update. I'm using goaccess for squid's and i have the same issues Thanks in advance

1585736718.968 1 xxx.xxx.xx.xx TCP_DENIED/407 4066 CONNECT clients1.google.com:443 - HIER_NONE/- text/html 1585736722.804 0 xxx.xxx.xx.xx TCP_DENIED/407 4066 CONNECT clients1.google.com:443 - HIER_NONE/- text/html 1585736726.437 0 xxx.xxx.xx.xx TCP_DENIED/407 3894 CONNECT prod.global.ssl.fastly.net:443 - HIER_NONE/- text/html 1585736729.681 1 xxx.xxx.xx.xx TCP_DENIED/407 3930 GET http://www.msftncsi.com/ncsi.txt - HIER_NONE/- text/html 1585736736.278 9340 xxx.xxx.xx.xx TCP_TUNNEL/200 5537 CONNECT prod.global.ssl.fastly.net:443 dorlan FIRSTUP_PARENT/201.220.211.68 - 1585736738.764 1 xxx.xxx.xx.xx TCP_DENIED/407 3894 CONNECT prod.global.ssl.fastly.net:443 - HIER_NONE/- text/html 1585736748.551 5140 xxx.xxx.xx.xx TCP_DENIED/407 3828 CONNECT 193.37.254.172:554 - HIER_NONE/- text/html 1585736748.984 0 xxx.xxx.xx.xx TCP_DENIED/407 4066 CONNECT clients1.google.com:443 - HIER_NONE/- text/html 1585736752.825 1 xxx.xxx.xx.xx TCP_DENIED/407 4066 CONNECT clients1.google.com:443 - HIER_NONE/- text/html 1585736753.566 10146 xxx.xxx.xx.xx TCP_DENIED/407 3939 CONNECT 79.142.76.221:22 - HIER_NONE/- text/html

sudo goaccess /var/log/nginx/access.log --log-format='%x.%^ %~ %L %h %^/%s %b %m %U %^' --date-format=%s --time-format=%s Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '' doesn't match specifier '%h' Token '10146' doesn't match specifier '%h' Format Errors - Verify your log/date/time format

allinurl commented 3 years ago

It looks like this was fixed starting in v1.3. I tested this out in the latest v1.4.3 against the attached file posted here and works as expected. It did not work in v1.2 around the time this was reported.

2020-12-16-213714_659x498_scrot

The command used was:

goaccess squid.log --log-format='%x.%^ %~%L %h %^/%s %b %m %U %^' --date-format=%s --time-format=%s --http-protocol=no --ignore-panel=BROWSERS --ignore-panel=OS --ignore-panel=REFERRING_SITES

Closing this. Feel free to reopen it if needed.