Open rseffner opened 4 years ago
There are also _FILETYPES and _DOWNLOAD sections in awstats data file. While - for an example - in _FILETYPES in row PDF is a value of 12.536.762 the sum of all PDF files mentioned in _DOWNLOADS section is 53.230.834.
Sum of FILETYPES equals sum of _DOMAIN and equals sum of _TIME column "bandwidth". As we learned from awstats we have to add _TIME column "bandwidth not viewed" to catch traffic from robots, malware and with special HTML-return codes.
Another point seem to be to add also the sum of _DOWNLOAD section to get the WHOLE traffic (because it differs from _FILETYPES which equals _DOMAINS/_TIME-bandwidth).
Why there is no TOTAL line in awstats?
I also stumbled over this because I wondered why my Nextcloud domain has very little traffic. That is because if you use WebDav this traffic wont show up in _DOMAIN, but in _LOGIN.
I was not able to find a documentation about Awstats Data File, how to read it correctly?
At the moment I'm thinking about reading the Apache Log directly for calculating traffic, I found a little Perl snippet which does it quite good and fast:
cat access.log | perl -nE '/\[.+\] ".+" \d+ (\d+)/; $sum += $1; END {$sum = $sum / 1024 / 1024; printf("%.3f MB", $sum)}'
Something like that in PHP for Froxlor should also work, I think.
At the moment Froxlors traffic calculation is totally unreliable and has many issues (Systemd rotates the logs normally before Froxlor can calculate them and the two problems mentioned here).
I added a crude implementation of manual counting of traffic directly in the logfiles and logged this and what was found in Awstats and it differs widly, most of the times Awstats is only half of directly counting. But I suspect that the part about "BEGIN_TIME" counts every traffic. I will try to confirm this and if it is so then I will add a pull request to change the counting in Froxlor.
But I think a better idea would be to change the system totally. I made some major change to my Apache logs, for example I rotate every day and the rotated logs are postfixed with the date, which makes it very easy for everybody to find the correct logs. So my suggestion would be to do something like this also in Froxlor and then let Systemd rotate the logs at midnight and then we could to the calculation of http traffic at a later date (to relax load) and just look at the file from yesterday (and also assign the traffic to yesterday, currently the traffic from yesterday is written to DB with the date of today, which is confusing). Is there interest in such a big change?
well we decided years ago to let the traffic calculation handle projects that are made for that (webalizer, awstats). So the "main" problem here would be a wrong/incorrect transfer of webalizer/awstats values to the froxlor-database to display for admin/customers
Ok, then I will check if my assumption is correct about the TIME section and if yes I will send a pull request with a fix.
any news on that @tobyX ?
Sorry, I did let my code run for some months, but the numbers never added up at all and sometimes even where negative and I didnt found where the error is. And then other pressing issues came up... I will try to do it again and find out what went wrong.
So, I've just check on this a little deeper:
As we learned from awstats we have to add _TIME column "bandwidth not viewed" to catch traffic from robots, malware and with special HTML-return codes.
Wrong, the _TIME column only shows the viewed traffic, when adding up the values it's exactly the same as _DOMAIN
From what I've read, we need to add the viewed traffic and not viewed traffic - So I checked in the data file, the not viewed parts are ROBOT_, _WORMS and ERRORS_ but wenn adding these up, I get more than awstats shows for "not viewed traffic". Also when adding up _DOMAIN entries and dividing by 1024 - i'm still getting more KB than awstats shows for "viewed traffic" ...no idea where awstats gets these numbers from its own data-file...i might be missing something.
Any ideas?
We've integrated 'goaccess' into the next major version of froxlor which will also be the new default
Summary
At the moment only outgoing webserer traffic with known http status codes is calculated for froxlors traffic statistics. With changes in apaches LogFormat an parsing awstats data-file we should be able to count ALL the traffic going through the webserver.
System information
Steps to reproduce
Expected behavior
Actual behavior
AWStats splits the traffic data between the Viewed and non viewed traffic. AWStats's explanation on non viewed traffic is "Not viewed traffic includes traffic generated by robots, worms, or replies with special HTTP status codes."