allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.25k stars 1.1k forks source link

Add option to exclude request from report #695

Open wavexx opened 7 years ago

wavexx commented 7 years ago

As for exclude-ip, it's often convenient to exclude some known path roots from the statistics. For instance, I'd like to exclude requests for /favicon.ico or known protected areas (say, /admin/).

allinurl commented 7 years ago

I could add an --exclude-request option but I'm not sure if it's necessary since you could actually do some pre-processing such as:

cat access.log | grep -v -f exclude_list.txt | goaccess -

or

cat access.log | grep -Ev '/favicon.ico|/admin/' | goaccess -

Does that address the use case you mentioned above?

wavexx commented 7 years ago

On Tue, Mar 21 2017, Gerardo O. wrote:

I could add an --exclude-request option but I'm not sure if it's necessary since you could actually do some pre-processing such as:

cat access.log | grep -v -f exclude_list.txt | goaccess -

or

cat access.log | grep -Ev '/favicon.ico|/admin/' | goaccess -

Does that address the use case you mentioned above?

I already do that, but it's a hack without parsing the actual log file. You need to anchor the paths to the request field to be somewhat more precise, such as '(GET|POST|HEAD) \/path..', but it would be much more polished to do it after parsing.

allinurl commented 7 years ago

Got it. I can certainly add this option. Please keep this open so I can look into it. Thanks!

kaworu commented 7 years ago

I am interested by this too.

lvbeck commented 6 years ago

I agreed, the requests for static files (images, javascripts css and so on) should be excluded from report, at least in the "REQUESTED FILES (URLS)" panel, otherwise it's really difficult to analyse the result. It would be great to have the "--exclude-request-list" option followed by an exclude_list file

Jonuz commented 6 years ago

+1 it would be really handy if static files could be exuded without having to play manually with grep.

wavexx commented 6 years ago

Any news on this? The "--static-file" flag is a bit pointless for me, as I mostly run goaccess on entirely staticly-generated websites. But even on dynamically-generated websites, I still want to factor-in the size of the static files anyway.

The files I'd like to ignore are identified by paths, not just extensions.

Just to give an idea, I'd like to exclude requests for the prefix "/phpmyadmin". These are bots that generate hundreds of useless requests that show up and pullute the "failed requests" section, making it useless for traffic analysis.

pozitron57 commented 6 years ago

@allinurl, thanks,

cat access.log | grep -v -f exclude_list.txt | goaccess -

works fine to generate a static report, but doesn't work well with --real-time-html. I agree that it would be great to have --exclude-request-list.

allinurl commented 6 years ago

@pozitron57 have you tried using --line-buffered with grep? e.g.,

tail -f -n +0 access.log | grep --line-buffered -v -f exclude_list.txt | goaccess -
pozitron57 commented 6 years ago

@allinurl thanks, that works!

groovenectar commented 5 years ago

I already do that, but it's a hack without parsing the actual log file. You need to anchor the paths to the request field to be somewhat more precise, such as '(GET|POST|HEAD) \/path..', but it would be much more polished to do it after parsing.

Here's a way that should catch anything from GET to DELETE for a precise path:

cat access.log | grep -Ev ' "[A-Z]{3,6} \/path' | goaccess -

wavexx commented 5 years ago

Except when "YO /path" is in the user agent. Please don't suggest grep again, or anything regex related. My request is to have proper prefix matching on the path, within goaccess.

groovenectar commented 5 years ago

What is a YO HTTP method? Edit: I see what you're saying about that now, like if someone has a malformed UA.. still easy to remedy if we really need to dig that deep, which is what it seems we're expecting the author to do in this thread..

The author would be having to figure out an approach to implement this entirely new functionality for you.... Possibly with regex......

I'm really glad that regex was suggested as some kind of solution, so I wanted to contribute what I did with it today... I came across it only in this thread. Sorry to offend

johntyree commented 4 years ago

Besides being buggy, none of the regex solutions here work after goaccess has parsed the logs already. If I have a year's worth of data in a big on-disk database and I realize that it would be better if I filtered out some path, I can't unless I've also saved the original request logs.

michieloosterling commented 4 years ago

I too would love an option to exclude certain paths from the results. For instance, being able to exclude stuff like /admin would be great.

CryDeTaan commented 4 years ago

@pozitron57 have you tried using --line-buffered with grep? e.g.,

tail -f -n +0 access.log | grep --line-buffered -v -f exclude_list.txt | goaccess -

How would this be used in a service, or with --daemon I may be missing something.

allinurl commented 2 years ago

Just an FYI. This request will be addressed by #117. Working on it as we speak — stay tuned!