eldy / AWStats

AWStats Log Analyzer project (official sources)
https://www.awstats.org
369 stars 120 forks source link

Add Hetzner Monitoring to robots #261

Open dannybeckett opened 1 month ago

dannybeckett commented 1 month ago

Hi guys

Please could you add this to robots? It is our server provider's monitoring service and it generates a lot of hits.

User-Agent: Hetzner_System_Monitoring

Thanks!

chuckhoupt commented 1 month ago

Generally, I think only global robots should be included in AWStats robots.pm. Every entry slows AWStats down, and web-host specific entries would only be relevant to the few using that web-host. Possibly adding a generic catch-all pattern for monitor might be worth having.

For custom/local robots, the documented way to handle them is to include them in a SkipUserAgent directive, or locally modify robots.pm: https://www.awstats.org/docs/awstats_config.html#SkipUserAgents

It would be nice if there was a way to avoid directly modifying robots.pm. Possibly this could be done via a plugin or new directive.

dannybeckett commented 1 month ago

@chuckhoupt The SkipUserAgents option sounds perfect, thank you! I now have these options set:

SkipUserAgents="Hetzner_System_Monitoring"
OnlyUserAgents=""
LogFormat=1

I noticed in the Docs it says that it only affects future logs that will be processed.

I tried to regenerate the stats using this tool in Plesk:

plesk sbin statistics --generate-all-webstat
plesk sbin statistics --calculate-all

However it is still showing up in Unknown Browsers:

2024-07-23 15_57_58-AWStats for domain and 90 more pages - Personal - Microsoft​ Edge

After looking at the Apache access log, it seems that AWStats is replacing spaces with underscores in the user agent:

213.133.113.83 - - [23/Jul/2024:13:49:18 +0100] "GET / HTTP/1.1" 301 162 "-" "Hetzner System Monitoring"

I would try with spaces instead of underscores, but it seems that SkipUserAgents uses the space character as a separator.

I just tried this RegEx but it still shows up in AWStats unfortunately:

SkipUserAgents="REGEX[Hetzner\sSystem\sMonitoring]"

I also tried:

SkipUserAgents="REGEX[.*Monitoring.*]"

And:

SkipUserAgents="REGEX[Monitoring]"

I also tried to manually regenerate the stats using this tool in Plesk, but it didn't help.

On the plus side, it does appear to have prevented it from showing up in future stats. I can see that this useragent has not visited the site for about an hour, when I know it accesses our site every 5 minutes - so it looks like it should be hidden next month :)

2024-07-23 17_04_49-AWStats for domain and 88 more pages - Personal - Microsoft​ Edge

I have left the option set as:

SkipUserAgents="REGEX[Hetzner\sSystem\sMonitoring]"