darold / squidanalyzer

Squid Analyzer parses Squid proxy access log and reports general statistics about hits, bytes, users, networks, top URLs, and top second level domains. Statistic reports are oriented toward user and bandwidth control.
http://squidanalyzer.darold.net/
127 stars 36 forks source link

Problems parsing squidguard logs #164

Closed mhotch closed 7 years ago

mhotch commented 7 years ago

Squidguard logs blocked urls and domains. The output of a squidguard log looks similar to this:

2017-05-26 13:35:39 [11147] Request(clients/adv/-) cdn.livefyre.com:3128 10.10.10.10/10.10.10.10 user.name CONNECT REDIRECT 2017-05-26 13:40:13 [11147] Request(clients/adv/-) http://worldnow-d.openx.net/w/1.0/jstag?nc=43459271-wnow 10.10.10.10/10.10.10.10 user.name GET REDIRECT 2017-05-26 13:40:14 [11147] Request(clients/adv/-) http://ads.financialcontent.com/www/delivery/afr.php?n=fcad6300237&&zoneid=1311&cb=fcad6300237 10.10.10.10/10.10.10.10 user.name GET REDIRECT

The current build works fine with "url" entries, but does not correctly report domain logs. Instead, many entries are reported (under top denied) as "REDIRECT". I believe this is because the regex does not match anything that does not have a url prefix then "//" in the _parseData subroutine.

I have fixed this in my environment by modifying the regex on line 1951. This regex will match on http://foo.com OR foo.com.

CURRENT: $url =~ m/^[^\/]+\/\/([^\/]+)/; NEW: $url =~ m/^(?:[^\/]+\/\/|)([^\/:]+)/;

Thanks!

darold commented 7 years ago

Thanks, patch applied on commit f94b224.