allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.3k stars 1.11k forks source link

Treat files as vhosts #2384

Open ollybee opened 2 years ago

ollybee commented 2 years ago

Seeing stats per vhost is useful, especially to identify one site on a server that is being attacked or hogging resources.

--files-as-vhosts when combined with multiple input files would effectively give this feature without modifying the log format.

This would be especially useful on servers using control panels like cpanel or plesk where you could glob the input files like goaccess /var/log/apache2/domlogs// or goaccess /var/www/vhosts//logs/accesslog

allinurl commented 2 years ago

Hello,

Just so I understand a bit better how the output report should look like — so when --files-as-vhosts is used, goaccess should split the output report into multiple report files according to the vhost %v?

ollybee commented 2 years ago

No, sorry I dont think I explained well.

The idea is that if you gave goaccess multiple input files where the logs did not contain a vhosts field it would treat the files as if they contain a vhosts field that matched the filename.

cat testsite.com
127.0.0.1 - - [26/Sep/2022:13:41:47 +0000] "GET / HTTP/1.1" 200 11173
goaccess --files-as-vhosts testsite.com.log

would treat the file as if it had a vhost field like this: testsite.com:80 127.0.0.1 - - [26/Sep/2022:13:41:47 +0000] "GET / HTTP/1.1" 200 11173

This would be most useful when feeding goacces multiple log files on servers with many vhosts and separate log files per vhosts

allinurl commented 2 years ago

Thanks for clarifying this. Let me work on it and I'll post back as soon as I have this feature out.

allinurl commented 2 years ago

I was able to push out a commit that implements this. For flexibility, it uses POSIX regex to extract the vhost from the filename. Here's how it works. e.g.,

Assuming /path/awesome.com.log using --fname-as-vhost='.*' will extract awesome.com.log as vhost. Assuming /path/awesome.com.log using --fname-as-vhost='.' will extract a as vhost. Assuming /path/awesome.com.log using --fname-as-vhost='[a-z]*\.[a-z]*' will extract awesome.com as vhost.

You can use multiple log files and it will work the same the way (assumes the same pattern on the filenames).

Feel free to build from development and let me know how it goes.

ollybee commented 2 years ago

I've tested that and it works for me. Thank you. Accepting a regex to extract the name is an elegant solution I had not considered.

It would be good to extend it and add --dname-as-vhost for use cases as below

/var/www/awesome.com/access.log /var/www/amazing.com/access.log

The regex would be complicated, for dname, it might be better to just accept a depth instead so 3 in the example above or 4 in the example below which is the default layout for servers with the Plesk control panel. I cant think of a use case for a dname option to accept a regex.

/var/www/vhost/awesome.com/logs/access.log

allinurl commented 2 years ago

Glad that worked. I'll be pushing soon the next release, including this change. Stay tuned.

As far as the db, unfortunately it's not possible to persist the data based on a vhost without a major overhaul. This is because when the persistence routine is called, at that point it simply dumps the already processed dataset. Nonetheless, #117 may help achieve this, so I'll keep that in mind. I'm working on it as we speak btw.