eldy / AWStats

AWStats Log Analyzer project (official sources)
https://www.awstats.org
361 stars 119 forks source link

HTTP 206 response traffic is calculated twice #207

Open aleks-lavrov opened 2 years ago

aleks-lavrov commented 2 years ago

Describe the bug HTTP 206 response traffic is calculated twice for files with 'download' MIME type.

To Reproduce Steps to reproduce the behavior:

  1. Set up awstats on Apache site.
  2. Generate some traffic partially downloading files (using 'Range' header) with 'download' MIME type, e.g. '.txt', '.pdf', '.mp4' - see https://github.com/eldy/AWStats/blob/develop/wwwroot/cgi-bin/lib/mime.pm#L74
  3. Run awstats to calculate traffic.
  4. Check result traffic values.

Actual behavior Calculated traffic size is twice more than actual traffic.

Expected behavior Calculated traffic size is matched to actual traffic.

Screenshots None.

Desktop (please complete the following information): Any.

Smartphone (please complete the following information): Any.

Root cause There are 2 places in awstats, where size for the same log record is added to total: 1st - https://github.com/eldy/AWStats/blob/develop/wwwroot/cgi-bin/awstats.pl#L19241 2nd - https://github.com/eldy/AWStats/blob/develop/wwwroot/cgi-bin/awstats.pl#L19486 I think 1st code should be removed.

Additional context Also looks like currently awstats incorrectly treats 206 response. According to comment at https://github.com/eldy/AWStats/blob/develop/wwwroot/cgi-bin/awstats.pl#L19244 206 response is treated as a continued download of previously processed 200 response for the same url. But it's not correct - there can be only single 206 response for some url w/o initial 200 response.

aleks-lavrov commented 2 years ago

Quick patch:

--- wwwroot/cgi-bin/awstats.pl.orig     2018-01-07 21:36:46.000000000 +0700
+++ wwwroot/cgi-bin/awstats.pl  2021-07-29 13:51:00.119731508 +0700
@@ -18962,8 +18962,8 @@
                                        #$_downloads{$urlwithnoquery}->{$field[$pos_host]}[1] = $timerecord;
                                        if ($pos_size>0){
                                                #$_downloads{$urlwithnoquery}->{$field[$pos_host]}[2] = int($field[$pos_size]);
-                                               $DayBytes{$yearmonthdayrecord} += int($field[$pos_size]);
-                                               $_time_k[$hourrecord] += int($field[$pos_size]);
+                                               #$DayBytes{$yearmonthdayrecord} += int($field[$pos_size]);
+                                               #$_time_k[$hourrecord] += int($field[$pos_size]);
                                        }
                                        $countedtraffic = 6; # 206 continued download, so we track bandwidth but not pages or hits
                                        if ($Debug) { debug( " Download continuation detected: '$urlwithnoquery'", 2 ); }