allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.31k stars 1.11k forks source link

Strip URLS and ports to get only hostnames #2701

Open dtouzeau opened 2 months ago

dtouzeau commented 2 months ago

This is for squid analysis. when squid uses ssl the request is of type CONNECT host:port which is well reported by goaccess, however, it would be interesting to remove the port suffix in order to combine only requests regardless of the destination port.

In the case of HTTP requests, the entire request is displayed in the report, which isn't interesting. In the case of squid, only a host report should be interesting in order to get the right average.

Is it possible to add a token such as:

--strip-urls in order to limit requests to the "domain name" part only.

For information whe use this specific pattern

%^ %^ %^ %^ %^[%^]: %x.%^ %~ %T %h %^/%s %b %m %U %e %^/%^ %^ mac="%^" accessrule%^ ua="%u""

command line:

goaccess --date-format=%s --time-format=%s --no-global-config --datetime-format="%d/%b/%Y:%H:%M:%S" --log-format="%x.%^ %T %h %^/%s %b %m %U %e %^/%^ %^ mac=\"%^\" accessrule%^ ua=\"%u\"" --num-tests=100 --no-query-string --exclude-ip 127.0.0.1 --jobs=4 --ignore-panel=REFERRERS --ignore-panel=REFERRING_SITES --ignore-panel=REQUESTS_STATIC --ignore-panel=KEYPHRASES --log-file=/var/log/squid/access.log --output=/tmp/tempfile_1722790709489709195.tmp.13af9cdea8f790b342565b28aea449a8.html --debug-file=/tmp/debug.log --invalid-requests=/tmp/invalid.log

0bi-w6n-K3nobi commented 2 weeks ago

Hi @dtouzeau

Well, if I understood correctly, the GOAccess already tool/solution for it. In Manual page at Specifiers Section that you can read here, describe a specifier %^ which has ability to ignore/despise any piece of field! For example, in your case, do you can write LogFormat like:

.... CONNECT %v:%^ ...

How does this work?
The world CONNECT is ignore and the specifier %v will be get Hostname/Virtual Host ... and :%^ will be ignore the character : and PORT value!

I hope be clean. Feel free to more asks.