eldy / AWStats

AWStats Log Analyzer project (official sources)
https://www.awstats.org
361 stars 119 forks source link

Awstats appears to be inconsistent and slow #251

Open jmafc opened 6 days ago

jmafc commented 6 days ago

Describe the bug When I ran awstats for May at the beginning of June, the numbers appeared much higher than the previous month, an order of magnitude higher. However, although traffic to my website has increased, it has not increased by that much, as evidenced by (a) the size of the log files (number of lines increased by about 30%) and (b) traffic reported by Google Analytics. In addition, awstats has been taking much longer to complete its analysis than it was, say, two months ago (even during weekly updates). Typically, it used to process weekly updates in 30 minutes or so but it now took two to three hours.

To Reproduce In order to determine the cause for these problems, I completely removed awstats from my system and reinstalled it. Then I started re-running it with the log files from January 2024 to now. The most surprising thing is that the monthly reports show numbers that differ substantially from the previously produced reports.

Expected behavior Perhaps I misunderstand something but I thought that each line in log file represents a "hit", so that in a monthly report, in the Summary section, the number of "Viewed traffic" plus "Not viewed traffic" hits ought to corrrespond approximately to the number of lines in the log file and more precisely, equal to the number of "new qualified records". However, for example, while the number of hits for April "viewed traffic" is about 65% of the "qualified records", the "not viewed traffic" hits is reported as 4.3 times the number of qualifed records, as if awstats had extrapolated data that is not present in the log file (or counted some lines multiple times).

Screenshots I can provide some if necessary.

Desktop (please complete the following information):

Smartphone (please complete the following information): N/A

Additional context The system on which awstats has been running has not changed hardware-wise in the past six months, and has been on Debian stable all this time, which doesn't get much in the way of software updates. FWIW, it's running Perl 5.36.0.

jmafc commented 6 days ago

The results of processing the June log file are even more incomprehensible:

Parsed lines in file: 280774
 Found 100 dropped records,
 Found 0 comments,
 Found 0 blank records,
 Found 179545 corrupted records,
 Found 0 old records,
 Found 101129 new qualified records.

Note: for comparison, for May it only found 273 corrupted records out of 210647 parsed lines.

Yet, the Summary section shows the following numbers:

                          Pages         Hits
Viewed traffic *        1,485,447    1,616,533
Not viewed traffic *    2,423,308    2,894,034

How can there be over 4 million hits in a file that has 100k records?

jmafc commented 4 hours ago

Some further observations, based on just the first six days of July. I allowed the awstats 'update' to run independently, rather than start it manually and endure its apparent slowness.

Can someone please confirm that the number of Hits on the report should agree with the number of lines in the log file for any given period or identifiable subset such as IP or given date? Or if that is not the case, can you explain how can the two numbers be correlated?