darold / squidanalyzer

Squid Analyzer parses Squid proxy access log and reports general statistics about hits, bytes, users, networks, top URLs, and top second level domains. Statistic reports are oriented toward user and bandwidth control.
http://squidanalyzer.darold.net/
126 stars 36 forks source link

network-aliases and user-aliases files #155

Closed neroxyr closed 6 years ago

neroxyr commented 7 years ago

I've been reading and wanted to go deeper by using this two files: network-aliases and user-aliases. I want to use the network aliases to group ips by departments and the user-aliases with the username and ip address.

Do the changes I make affect when I run the next time squid-analyzer or previous data is modified? If the previous data is untouched, does it generate duplicates or the info still is accurate?

darold commented 7 years ago

Yes, changes will make effect at the next run of squid-analyzer but previous data will not be modified. This mean that you will have duplicate so you need to start from an empty data directory. The reason is that applying aliases replacement when reading old data file makes squid-analyzer very slow on huge data file.

If you want I can add a configuration directive to enable this behavior and allow the rebuild of all reports using the alias changes.

neroxyr commented 7 years ago

That'll be great because the files can change from time to time. I've also read in a closed issue that when you want to rebuild data you have to delete the SquidAnalyzer.current file and run squid-analyzer -j 8 /var/log/squid/access.log-* -d, then it only will parse only data found on this folder. Is this correct or the rebuild will be done to all of the data found in the squidanalyzer folder?

In the user-aliases file, it states that tab has to be there for the format, so it is allowed to have this format Name(space)Lastname(tab)ipaddr?

Thanks

darold commented 7 years ago

Please update SquidAnalyzer with the last development code, commit 77d2177 adds the UpateAlias configuration directive. Set it to 1 to not generate duplicates when aliases files have changed.

If you want to rebuild the reports to apply aliases changes, just execute squid-analyser --rebuild. You don't have to remove the SquidAnalyzer.current file.

neroxyr commented 7 years ago

I've never update SquidAnalyzer before, do I have to download the modified files and recompile the program again?

darold commented 7 years ago

Proceed as follow:

wget https://github.com/darold/squidanalyzer/archive/master.zip
unzip master.zip
cd squidanalyzer-master/
perl Makefile.PL
make
sudo make install

then update your configuration file by applying change in /etc/squidanalyzer/squidanalyzer.conf.sample

neroxyr commented 7 years ago

Thanks, will try to make the changes and let you know how it goes by modifying the files twice. It should not take long, should it (I mean rebuilding the data)?

darold commented 7 years ago

It depend of the amount of statistics you have but usually it doesn't takes too much time. Make a copy of the data directory before running it. What is the result of command du -sh /var/ww/squidanalyzer/ ?

neroxyr commented 7 years ago

19 GB

darold commented 7 years ago

So it will take very long time but you can reduce the rebuilt part using option --build_date of squid-analyzer

neroxyr commented 7 years ago

Great, just did what you told. Installation went smooth, some changes I've made previous where reflected in the new squidanalyzer.conf file. Ran squid-analyzer -r -b 2017-01-06 just to only test the changes in one day and these are the observations I've got:

darold commented 7 years ago

What is the content of the network-aliases that doesn't works?

neroxyr commented 7 years ago

I have this: adm_basico 172.16.2.9,172.16.2.11,172.16.2.13,172.16.2.19,172.16.2.21,172.16.2.25,172.16.2.31,172. 16.2.32,172.16.2.33,172.16.2.42,172.16.2.43,172.16.2.44,172.16.2.51,172.16.2.57,172.16.2.62,172.16.2.63 ,172.16.2.75,172.16.2.87,172.16.2.92

One more thing I observed is that when I want to order by the Users columns it doesn't order alphabetically

darold commented 7 years ago

You can write it as follow:

adm_basico      ^172\.16\.2\.(9|11|13|19|21|25|31|32|33|42|43|44|51|57|62|63|75|87|92)$

there's a tabulation between the alias and the regex.

Does it works better?

neroxyr commented 7 years ago
adm_basico      ^172\.16\.2\.(9|11|13|19|21|25|31|32|33|42|43|44|51|57|62|63|75|87|92|94|95|97|98|99|100|105|108|109|110|112|113|
117|131|132|145|149|159|162|171|172|173|177|183|198|203|217|222|231|232|236|240|241)$

I wrote that (tabulation included) but still no group is shown.

I run squid-analyzer -r -b 2017 so that it rebuilds all 2017 and later run the normal command line to parse today's log with /usr/local/bin/squid-analyzer > /dev/null 2>&1. I could see two thing when checking in the html sites:

  1. The networks split individually. It is grouped by network but then it is grouped by single ips. Before, I put the sample network-aliases file so that it stays normal.
  2. Users appear duplicated now. I have two records of the same username. I checked the users-aliases if there was the same name written twice, but it was only once.

Update: run with the newly changes you made and got this: sa_1 and in the users still have duplicates and changes are not being refreshed.

Update 2: The user-aliases twice problem is because there's a URL with the ip address of that user and the other one with user_SPC_name.

darold commented 7 years ago

Does this issue is solved in v6.6 or do you still have the same issue? Even if I guess you have found a workaround, sorry for the response delay :-(

neroxyr commented 7 years ago

Hi, i'll check out tomorrow to see how it generates. I have one question, I want to remove last year's data, Do I just remove the 2016 folder or is there anything else to do?

darold commented 7 years ago

Yes just remove this directory and wait for next squid-analyzer run to see your main index file changed or if you can not wait, modify the index.html by hand to remove the line about 2016.

neroxyr commented 7 years ago

Just rebuilt the last date so I can see if changes are made (the aliases are as they are, by default) but it doesn't seem to group the networks, they appear as if a group was and IP squid-analyzer -d -r -b 2017-06-20 image

And also the users don't have aliases but still their names appear

darold commented 7 years ago

Last commit 75c88fc might solve the network alias replacement. Please give try to latest development code event if I'm not sure that rebuild will fix your current reports but at least new one must be fixed.

neroxyr commented 7 years ago

Do I download again as If were doing an update?

darold commented 7 years ago

Yes, download using: wget https://github.com/darold/squidanalyzer/archive/master.zip this is the latest code.

neroxyr commented 7 years ago

Thanks, just tested it and the new info finally gets grouped by network. image

Do the changes you made can be also be shown using the network-aliases file? So I can group some ips to see how they are displayed. One thing I also noticed is that when I want to order the user statistics by user it doesn't get sorted accordingly as you can see in the image below image

The same applies with names image

Update: I have the squid-analyzer to run every day at 22:00 but the next day it keeps on going making the navigation slow because of the process, is there a way it can take less time? Because revising the debug log when a day is added it reforms the week, the month and the year to make up for the totals I think. I had to stop the process because it was more 8 hours.

darold commented 7 years ago

Hi,

I will look on user sort issue, thanks for the report. If you have huge squid logs you might want to not compute year and month report, in this case use the --no-year-stat. I will just compute daily and weekly reports.

Regards,

neroxyr commented 7 years ago

Thanks. I now created aliases in both networks and users files, but when running /usr/local/bin/squid-analyzer -d -r --no-year-stat -b 2017-07 this happens

SquidAnalyzer version 6.6
Building HTML output into /var/www/squidanalyzer
Generating statistics for day 2017-07-03
        User statistics in /var/www/squidanalyzer/2017/07/03...
        Mime type statistics in /var/www/squidanalyzer/2017/07/03...
        Network statistics in /var/www/squidanalyzer/2017/07/03...
        Top URL statistics in /var/www/squidanalyzer/2017/07/03...
        Top denied URL statistics in /var/www/squidanalyzer/2017/07/03...
        Top domain statistics in /var/www/squidanalyzer/2017/07/03...
        Cache statistics in /var/www/squidanalyzer/2017/07/03...
Generating statistics for month 2017-07
        Cache statistics in /var/www/squidanalyzer/2017/07...
Generating statistics for week 28 on year 2017
        User statistics in /var/www/squidanalyzer/2017/week28...
        Mime type statistics in /var/www/squidanalyzer/2017/week28...
        Network statistics in /var/www/squidanalyzer/2017/week28...
        Top URL statistics in /var/www/squidanalyzer/2017/week28...
        Top denied URL statistics in /var/www/squidanalyzer/2017/week28...
        Top domain statistics in /var/www/squidanalyzer/2017/week28...
        Cache statistics in /var/www/squidanalyzer/2017/week28...
Generating statistics for year 2017
        Cache statistics in /var/www/squidanalyzer/2017...
FATAL: can't opendir /var/www/squidanalyzer/2017/week28: No such file or directory

And it seems to duplicate the network aliases image

The file is showing only four groups as shown:

adm_basico      ^172\.16\.2\.(11|19|21|30|31|32|33|57|62|63|75|87|92|96|100|105|110|112|117|131|132|145|149|152|159|162|171|172|177|183|198|217|231|236|240)$
,^172\.16\.102\.(9|25|42|43|44|94|95|97|98|99|102|108|109|173|203|222|232|241)$
adm_conta       ^172\.16\.2\.(34|53|79|81|101|106|119|146|200|204|205|234|243)$
adm_perco       ^172\.16\.2\.(50|55|58|125|140|199|208|244)$,172.16.18.154,172,16,23,159,^172\.16\.102\.(10|24|35|40|41|52|67|76|93)$
adm_percodrop   ^172\.16\.2\.(137|209)$,172.16.102.252