darold / squidanalyzer

Squid Analyzer parses Squid proxy access log and reports general statistics about hits, bytes, users, networks, top URLs, and top second level domains. Statistic reports are oriented toward user and bandwidth control.
http://squidanalyzer.darold.net/
125 stars 36 forks source link

9 hour to generate yearly statistics #179

Closed y-master closed 6 years ago

y-master commented 6 years ago

Hi,

I have dozen of proxy with same setup of squid and squidanalyzer. On one of them, the yearly statistics take between 6 and 9 hour to be generated. I run squidanalyzer with this command "/usr/local/bin/squid-analyzer -p 12 -d /var/log/squid3/access.log" once a day. My log file is between 60Mb and 80Mb

On the others proxy with little bit less traffic, statistics generation take ~30min with "-p 6" My squidanalyzer.conf is pretty simple, see attached : configfile.txt

Any idea where I can look to solve this ?

darold commented 6 years ago

Do you have the same resources in the slow proxy than in others? Perhaps you have a slower CPU and less RAM that force Perl to swap to much memory?

y-master commented 6 years ago

These are VMs and identical : 1 vcpu and 2gb of ram, no resource pool or any limitation. (CPU are identical : Xeon E5-2630L v3) Here is an example log of more than 8 hour to analyze 50000 line : long_process.txt Most of the time has been spent on yearly statistics The CPU was 100% all the time and htop show me that other process don't take many resources.

Here is the same case on another VM : quick_process.txt

I still have the solution to add a vcpu to this VM but I would like to understand...

Regards,

darold commented 6 years ago

This is not the analyze of 50000 that takes 8 hours this is the recalculation of years statistics. I guess that if you set -p 6 you will have the same time. I definitively have to work on optimizing this part.

y-master commented 6 years ago

6 more month of data make this exponential process time ? wow... I will check if we can reduce the preserve time.

Thanks for the follow up and for your awesome software !

y-master commented 6 years ago

I upgraded the VM to 2 vCPU and changed the command line to "/usr/local/bin/squid-analyzer -p 6 -j 2 -d /var/log/squid3/access.log" It's still taking ages to calculate years statistics, mostly because the last job run on only one CPU. Any idea to optimize this ?

darold commented 6 years ago

Usually most user disable year statistics to save time using --no-year-stat, note that if you still want month report you need to add the --with-month-stat otherwise only day and week statistics will be computed. This is only available in current development code.

About how to optimize this, yes I have my idea but unfortunately not the time :-(