darold / squidanalyzer

Squid Analyzer parses Squid proxy access log and reports general statistics about hits, bytes, users, networks, top URLs, and top second level domains. Statistic reports are oriented toward user and bandwidth control.
http://squidanalyzer.darold.net/
127 stars 36 forks source link

Missing fields in regex parsing #6

Closed nuxsmin closed 11 years ago

nuxsmin commented 11 years ago

Hi, i've recently installed Squid Analyzer and fields beyond username are not being parsed correctly, because it identifies status as login and so on. Squid log format was modified as mentioned with the same result.

I've rewritten the regex to match these fields and be more accurate on matches.

Here is the code:

diff SquidAnalyzer.pm /usr/share/perl5/SquidAnalyzer.pm
193c193,194
<               if ( $line =~ s#^(\d+\.\d{3})\s+(\d+)\s+([^\s]+)\s+([^\s]+)\s+(\d+)\s+([^\s]+)\s+## ) {

---
>               #if ( $line =~ s#^(\d+\.\d{3})\s+(\d+)\s+([^\s]+)\s+([^\s]+)\s+(\d+)\s+([^\s]+)\s+## ) {
>               if ( $line =~ s#^(\d+\.\d{3})\s+(\d+)\s+(\S+)\s+(\S+)\s+(\d+)\s+(\S+)\s+(.*)\s+(\S+)\s+(\w+\/\S+)\s+(\S+)\s+(\S+)## ) {
195a197
>                       $client_ip = $3 || '';
199c201,204
<                       $client_ip = $3 || '';

---
>                       $url = $7 || '';
>                       $login = $8 || '';
>                       $status = $9 || '';
>                       $mime_type = $10 || '';
221,225c226,231
<                       if ( $line =~ s#^(.*)\s+([^\s]+)\s+([^\s]+\/[^\s]+)\s+([^\s]+)\s*## ) {
<                               $url = lc($1) || '';
<                               $login = lc($2) || '';
<                               $status = lc($3) || '';
<                               $mime_type = lc($4) || '';

---
>                       #if ( $line =~ s#^(.*)\s+([^\s]+)\s+([^\s]+\/[^\s]+)\s+([^\s]+)\s*## ) {
>                       #if ( $line =~ s#^(.*)\s+(\S+)\s+(\w+\/\S+)\s+(\S+)$## ) {
>                               #$url = lc($1) || '';
>                               #$login = lc($2) || '';
>                               #$status = lc($3) || '';
>                               #$mime_type = lc($4) || '';
286c292
<                       }

---
>                       #}

Now, I'm testing this patch.

Regards

darold commented 11 years ago

Hi,

Can you please add a line from your squid access.log file to this report so that I could understand exactly the source of the issue and also the value of the logformat squid.conf or squid3.conf directive?

Regards,

nuxsmin commented 11 years ago

Hi,

here are the the logformat value and the line logged on access.log:

logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %un %Sh/%<A %mt %ea %lp

1357803919.415      1 192.168.1.1 TCP_IMS_HIT/304 378 GET http://www.ingdirect.es/home/images/destacados/pensar2.gif username NONE/- image/gif acl_edir_users 3129

The last two values in the line were added and the regex was updated to match them.

Regards.

darold commented 11 years ago

Ok, thanks a lot for the additional informations. It is now fixed in commit dfa1b30, this is not exactly your patch as parsing of fields with and after the request URL is done later in the SquidAnalyzer.pm code (line 223).

Regards,