darold / squidanalyzer

Squid Analyzer parses Squid proxy access log and reports general statistics about hits, bytes, users, networks, top URLs, and top second level domains. Statistic reports are oriented toward user and bandwidth control.
http://squidanalyzer.darold.net/
126 stars 36 forks source link

week 53 :) #72

Closed michaelgauthier closed 9 years ago

michaelgauthier commented 9 years ago

Hi and happy new year,

There is a week 53 for 2014.

Thank you.

2015-01-06_17h26_09

darold commented 9 years ago

Hi,

Happy new year too. Yes, this is normally week 1 but in year 2015 and SquidAnalyzer is spliting data in year/month/day, this is because in 2014 this week is numbered 53. We can not called it week 01 because it will override the previous one.

What is the output of the 2015-01 calendar?

Regards,

michaelgauthier commented 9 years ago

Hi,

I have two servers, Ubuntu 14.04 and 12.04, the 12.04 is in production and the 14.04 is in test phase. On the 12.04, I have the week 01 but not on the 14.04, the difference is how I generate file. On the 12.04 I generate everyday with a day file, on the 14.04 I generate every monday with a week file. On the 14.04, it seems, only the index.html is missing.

I will send to you the week file, if you want to try.

More I have the same probleme I told you last year, when I regenerate file with option rebuild, this double the entry log as you can see on screen below.

Thank you

2015-01-07_09h52_37 2015-01-07_10h24_51 2015-01-07_10h26_47

michaelgauthier commented 9 years ago

Hi,

I found a new bug this morning if you try to generate week report. Week 01 generate a year. Week 02 generate Week 01

Thank you

2015-01-12_08h58_28 2015-01-12_08h58_54

darold commented 9 years ago

Yes, I have solved the issues this weekend but do not find time to test and publish the fixes. Commit 16dd00f fix that issue. Please dowload and install latest code and use squid-analyzer with the --rebuild option to recover/fix previous report. Then let me know.

Best regards,

michaelgauthier commented 9 years ago

Hi,

I install the fix and it's working, did you find time to reproduce the bug of double entry?

Thank you

darold commented 9 years ago

Are you adding a log file in command line when you use --rebuild ? For example:

sudo squid-analyzer --rebuild access.log

I think this is the reason of the double entry, with --rebuild SquidAnalyzer does not read the history file. I will fix that.

darold commented 9 years ago

Last commit 9b3bd16 remove any log file from the parser list when --rebuild is used. This fixes the double entry.

michaelgauthier commented 9 years ago

Hi,

Yes, I'm adding a log file name in the command line. I'm using the log file name because sometimes it's like a day file is not completly however I can't regenerate without --rebuild because the script said there is no new log.

I will try the last commit and come back to you as soon as possible.

Thank you

michaelgauthier commented 9 years ago

Hi,

I install the last commit, I delete 2014, 2015 folders and the index.html but it block when I try to generate a file from 2014 (root@xxxxxx:/var/www/squidanalyzer# /usr/local/bin/squid-analyzer -j 8 /mnt/logs/squid/week/access2014-20-WEEK.log -d --rebuild No new log registered...)

Is there a file I can delete to make it work?

Thank you

darold commented 9 years ago

Yes, when SquidAnalyzer found a SquidAnalyzer.current file in the data directory it will not parse log entries older than the date inside this file. If you want to parse again old access log file, remove this file.

But you are not using the right command, do not use rebuild when you are parsing le log. Proceed as follow:

/usr/local/bin/squid-analyzer -j 8 /mnt/logs/squid/week/access2014-20-WEEK.log -d

Then if you have older data that you want to rebuild:

/usr/local/bin/squid-analyzer -j 8 -d --rebuild

Regards

michaelgauthier commented 9 years ago

Thank you for your reply,

I find a problem but I think it's on my side, the log from October can't be parse after september. According to the debug, it seems the timestamp log are previous from september. I have to delete the Squidanalyzer.current after each file.

darold commented 9 years ago

In that case it is better to give all the files to squid-analyzer it will not take care of the date order. For example:

/usr/local/bin/squid-analyzer -j 8 /mnt/logs/squid/week/access2014-* -d

or if you are starting from week 20:

/usr/local/bin/squid-analyzer -j 8 /mnt/logs/squid/week/access2014-[2345]* -d

If you don't use it like that you will need to remove the Squidanalyzer.current after each command.

michaelgauthier commented 9 years ago

In that case it is better to give all the files to squid-analyzer it will not take care of the date order. For example: /usr/local/bin/squid-analyzer -j 8 /mnt/logs/squid/week/access2014-* -d

Yes, i'm using this command to generate after a new release. But when it's time to generate october it said : No new log registered...

darold commented 9 years ago

Does the file has the native squid format ?

michaelgauthier commented 9 years ago

I didn't check the format but I think is good because it works if I delete SquidAnalyzer.current. the thing I don't understand is if there are log with a timestamp earlier than october, why, when I generate only the week 41 I don't have log in other week. But I will look the log file to determine why is not working.

michaelgauthier commented 9 years ago

I try to rebuild, and I have this error:

root@SERVERXY:/var/www/squidanalyzer/2015# /usr/local/bin/squid-analyzer -j 8 -d --rebuild ERROR: you must give a valid path to the Squid log file.

PS:

the command is working on my production server Commit 16dd00f and not working on test server Commit 9b3bd16.

Thank you

darold commented 9 years ago

Ok, latest commit 3c54d8d solves this issue. You can also set LogFile in your configuration file as a workaround.

michaelgauthier commented 9 years ago

Commit 3c54d8d is working. I'm looking why my files from october are not working and then we could close the ticket

Thank you.

PS: which timestamp from the log file is compare with SquidAnalyzer.current?

darold commented 9 years ago

Hi,

Sorry for the delay, I have missed you last question.

For example here is the content of a SquidAnalyzer.current file:

 1369522798.607  7395980

The first number is the timestamp of the last line parsed and the second number is the offset of the last parsing position in the file. It is used to not reread the file if it already have been parsed.

Regards,

mohamedhami1d commented 9 years ago

Hi there

Not sure if this is the best place to post this, I have 6 months worth of logs that I wish to analyse, I came across this application.. I have installed however I am not sure the best way to get it to look at the 3 months worth of squid logs..

In addition my squid uses kerberos authentication, most traffic is authenticated but some traffic will not be and so will appear from the IP address of the client...

log formats for un-authenticated traffic look like this x.x.x.x being the IP

Dec 30 06:29:47 squid-proxy-01 squid3: x.x.x.x - - [30/Dec/2014:06:29:47 +0000] "GET http://security.ubuntu.com/ubuntu/dists/precise-security/Release HTTP/1.1" 304 432 "-" "Debian APT-HTTP/1.3 (0.8.16~exp1 2ubuntu10.22)" TCP_MISS:FIRST_UP_PARENT

For auth traffic

Dec 30 07:50:16 squid-proxy-01 squid3: x.x.x.x - user@DOMAIN.COM [30/Dec/2014:07:50:16 +0000] "HEAD http://wwnstusxxjcla/ HTTP/1.1" 404 369 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36" TCP_MISS:FIRST_UP_PARENT

When trying to run the script against a log file I have extracted it seems like it cannot read information from historical logs files..

darold commented 9 years ago

Hi,

You will not be able to have any report from your log files because it is not in native squid log format. The HTTP like log format is not supported by SquidAnalyzer. But I see that lot of people are using it so I will add this feature. I will try to publish a new release next week with support to this format.

Regards

mohamedhami1d commented 9 years ago

Hi Darold

Many thanks for your swift reply..

So to understand clearly because our squid logs are configured to display HTTP information such as user agent strings... the squid analyzer currently does not support this?

I shall look forward to the new release :)

darold commented 9 years ago

Yes that's the source of the problem. Squid native log format looks like:

 1369477582.975  39845 172.21.117.19 TCP_MISS/200 496 GET http://5-ect.channel.facebook.com/pull? - DIRECT/69.141.278.2 text/plain

This is what SquidAnalyzer is expecting to find in your log.

mohamedhami1d commented 9 years ago

Ah makes sense..

Thanks again and will try again once new release supports our format

michaelgauthier commented 9 years ago

Hi,

I've been looking why my file from October don't generat properly, and I find nothing. Otherwise I just have a last question before we close the ticket, can I have several squidanalyzer on one server? Previously I asked you if I could delete internal connection from the log report, but now I would like to separate them. Either have two instance of squidanalyzer on the same server or maybe in the futur just have a link to choose internal or external report.

Thank you.

darold commented 9 years ago

Hi Michael,

If you can send me to my private email a link to download your log files I will be happy to check what's going wrong.

About the second question yes you can have two instance of squidanalyzer on the same server, you just have change the path of the configuration file (-c option of squid-analyzer) and use different path in their respective squidanalyzer.conf file.

regards,

mohamedhami1d commented 9 years ago

Hi Darold

Did you get a change to include support for squid common log formats?

I have months worth of logs that I urgently need to analyse your tool would be perfect to use..

Regards

darold commented 9 years ago

Hi Mohamed,

Yes, I think you will be pleased to read that the latest development code include support to common and combined (http like) log format. I still want to add support to an additional field (mime type) not present in those formats. We are closed to the next release now.

If you want you cant try it and let me know if you find any issue with latest code.

Best regards,

mohamedhami1d commented 9 years ago

Hi Darold

I can confirm that I can now generate logs many thanks for your work.

I have another question..

If I have many logs to analyse what is the best way to use your tool to go through lets say 6months worth of logs? Can your tool extract and analyse or should I simply put all my logs in one large log file?

darold commented 9 years ago

Hi,

You can just run SquidAnalyzer a single time. Let's says all your old files are in the /var/log/squid3/ directory, then you just have to execute squid-analyzer as follow:

squid-analyzer  -j N /var/log/squid3/access.*

No matter if there is gzipped compressed files or if they are not returned in historical order, squid-analyzer will parsed all the files as if there were just one file.

The -j N option might be used if you have some unused cores on your server otherwise SquidAnalyzer will just used one single core, in that case, reports can take time to complete.

Regards,

mohamedhami1d commented 9 years ago

Hi there

Thank you for your reply

currently all my logs are in bz2 format and sorted by folder for year and then folders for month within the year.. I guess I will need to move all my log files into a single directory

Regards

mohamedhami1d commented 9 years ago

Hi Darold

Just a heads up I ended up putting all my logs in one folder and ran the tool against it.

Worked a charm.

Would be good to get mime-type support too :)

darold commented 9 years ago

That's the case now with latest commit. You need to add the %mt placeholder in the comon or combined log format. Ex:

logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st %Ss:%Sh %mt

Regards,