gforcada / haproxy_log_analysis

HAProxy log analyzer
https://pypi.org/project/haproxy_log_analysis
GNU General Public License v3.0
88 stars 35 forks source link

Fails on logs of any significant size #20

Closed nikess closed 4 years ago

nikess commented 7 years ago

With a memory error. My sample input is 250mb.

kapad commented 6 years ago

yep.. terrible tool performance wise. I do need some tool that can do filtering of this kind.. no any other options @nikess ?

gforcada commented 6 years ago

When I developed and used it, performance was ok, maybe a bit slower but fine, for a sporadic analysis, now I'm also bitten by a way too much memory usage myself. I don't have much time to work on it unfortunately.

I guess that redesigning the whole project to use iterators would help, anyone up for it?

kapad commented 6 years ago

Yes. It's the memory usage that made using it so difficult for me too.

I think you're suggestion of redesigning the tool to use iterators and not load the entire file to memory is correct. It would definitely solve the issue but would also probably be an almost complete rewrite.

I've used halog (https://www.systutorials.com/docs/linux/man/1-halog/) and goaccess (https://goaccess.io/) to analyze my logs.

halog is the log processor that is packaged along with haproxy, but I felt that it has some issues when filtering the output on urls. The results didn't match with what I got using grep/sed/awk.

goaccess is a really amazing tool and outputs a beautiful dashboard with a lot of useful analytics data from the haproxy logs. For goaccess though, you will need to configure the log processing string.

I used

goaccess haproxy.log --log-format='%^:%^:%^: %h:%^[%d:%t.%^] %^ %^/%v %^/%^/%^/%^/%L %s %b %^"%r"' --date-format=%d/%b/%Y --time-format=%T -a > report.html

My logs are in the default http format for haproxy.

On 1 March 2018 at 21:19, Gil Forcada Codinachs notifications@github.com wrote:

When I developed and used it, performance was ok, maybe a bit slower but fine, for a sporadic analysis, now I'm also bitten by a way too much memory usage myself. I don't have much time to work on it unfortunately.

I guess that redesigning the whole project to use iterators would help, anyone up for it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gforcada/haproxy_log_analysis/issues/20#issuecomment-369634907, or mute the thread https://github.com/notifications/unsubscribe-auth/AB7IS1o8426I99lfmZ2NO7VzL13bQtXrks5taBiFgaJpZM4LAZ4c .

-- Rohan Kapadia

gforcada commented 4 years ago

I'm happy to report you that I finally had time during Christmas to rewrite the tool, version 4.0.0 should be much less memory hungry :)