allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.38k stars 1.11k forks source link

Huge Spike in unkown user agents #2158

Closed Lennart01 closed 3 years ago

Lennart01 commented 3 years ago

For a couple of days, I can see a huge spike in unknown user agents. I have done some tests and android and windows devices seem still to be logged correctly. I can't verify apple devices but I assume they aren't logged correctly anymore. Idk how I reached a spike of 17% of Unknown user agents.

Lennart01 commented 3 years ago

I found the bad apple I think. [IP] - - [31/Jul/2021:22:24:14 +0200] "-" 408 319 "-" "-" [IP] - - [31/Jul/2021:22:24:10 +0200] "-" 408 319 "-" "-"

Lennart01 commented 3 years ago

I have identified the issue. Goaccess logs 408s as Unknown users. Would be nice if that could be avoided.

allinurl commented 3 years ago

Are you looking to ignore completely 408s? It shouldn't be counting those as unique visitors unless you are using --4xx-to-unique-count. Also, you can get a better sense of unknown agents using --unknowns-log=<filename>.

Lennart01 commented 3 years ago

That would be nice bec until now I was constantly waiting for one to pop up and then grabbed the apache log. It appears that it is counting them towards visitors but I will add the 2nd part u provided and check what it is returning. Also let me quickly check my config file to make sure I don't have that option enabled by accident.

Lennart01 commented 3 years ago

LOG of unknowns:

[OS]   -
[BR]   -
[OS]   -
[BR]   -
[OS]   -
[BR]   -
[OS]   -
[BR]   -
[OS]   Mozilla/5.0 (compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
[OS]   -
[BR]   -
[OS]   Mozilla/5.0 (compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
[OS]   -
[BR]   -
[OS]   -
[BR]   -
[OS]   -
[BR]   -
[OS]   -
[BR]   -
[OS]   Mozilla/5.0 (compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
[OS]   Mozilla/5.0 (compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
[OS]   -
[BR]   -

Uptime robot is a site health monitor I currently use. SO nothing surprising here. I have checked the config and I can confirm --4xx-to-unique-count is disabled

allinurl commented 3 years ago

A workaround would be to apply a filter prior to parsing the data. e.g.,

awk '$9!~/408/' access.log | goaccess -

117 will address this in a much nicer way, I'm working on this as we speak.

If --4xx-to-unique-count is not enabled, then it shouldn't be counting them as unique.

Lennart01 commented 3 years ago

Nice What kinda bothered me was the fact that they show up in the statistics with a user count. But as long as they aren't counted to the total that shouldn't be to big of an issue.

allinurl commented 3 years ago

Yep, it shouldn't, however please note that it will count the hits/requests, just not the visitors.

Closing this as I expect #117 will help with this in more detail. Feel free to reopen it if needed.