allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.26k stars 1.1k forks source link

memory usage limit #2665

Closed tyctor closed 5 months ago

tyctor commented 5 months ago

Hi

goaccess looks good, i want to try it to use for our radio page, unfortunately our VPS has limited memory (only 4GB) i have nginx access log of 6GB size with approx. 20 milion of lines inside. i cant run this with default storage, because it does not fit into RAM, and goaccess process is killed by kernel

goaccess -s
In-Memory with On-Disk Persistent Storage.
goaccess --output /tmp/report.html radiopunctum.cznginx.access.log --persist --restore --config-file /home/punctum/.goaccessrc --db-path /home/punctum/tmp/
Killed

is there some way how to limit memory usage, or, maybe better, to use disk storage instead of In-Memory?

i am looking forward to your answer

allinurl commented 5 months ago

If you're looking to reduce some RAM usage, here are a few things you could try out. First off, do you know if your requests are timestamped? If so, you might want to consider passing -q. Also, are you currently using the legacy GeoIP? If yes, give -g a shot. Another option is to SSH into the remote machine and parse the log locally on your own laptop/desktop machine (assuming it's got more RAM), e.g.,

ssh -n root@server 'tail -F -n +0 /var/log/apache2/access.log' | goaccess - --log-format=COMBINED -o report.html --real-time-html

Note: SSH requires -n so GoAccess can read from stdin. Also, make sure to use SSH keys for authentication as it won't work if a passphrase is required.

tyctor commented 5 months ago

thanks, for your reply locally parsing is OK, that works, but i want to have this completely on server, ofcourse only if it will be possible

if your requests are timestamped

i am not sure what you mean by timestamped my log record look like this:

31.30.175.177 - - [12/May/2022:02:19:02 +0000] "GET /api/radio/programme/playingnow/?_nc=1652321942&device=desktop HTTP/2.0" 200 485 "https://radiopunctum.cz/archive/20220426_chaosfera" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36"

so this is probably not timestamp, only date/time of request

i forgot to put --geoip-database ../geoip/GeoLite2-City_20240412/GeoLite2-City.mmdb in my example so i am using mmdb instead of legacy GeoIP

mainly what i asked for, is this part of man:

STORAGE There are three storage options that can be used with GoAccess. Choosing one will depend on your environment and needs. Default Hash Tables In-memory storage provides better performance at the cost of limiting the dataset size to the amount of available physical memory. GoAccess uses in-memory hash tables. It has very good memory usage and pretty good performance. This storage has support for on-disk persistence.

i cannot find what are that three options and how to tell goaccess which storage to use.

allinurl commented 5 months ago

I'm not certain how that text managed to stick around from an older version that stored data on disk, but I've gone ahead and updated the documentation to match the current versions. Did using -q make a difference? In your situation, using -g won't be beneficial since you're utilizing the newer geodb.

tyctor commented 5 months ago

thanks for reply i decide to do some log filtering before i send it to goaccess, i think this will be enough to solve memory usage

allinurl commented 5 months ago

Great! Stay tuned for updates! There might be a possibility of an on-disk option in the future.