Open dannybeckett opened 1 month ago
Generally, I think only global robots should be included in AWStats robots.pm
. Every entry slows AWStats down, and web-host specific entries would only be relevant to the few using that web-host. Possibly adding a generic catch-all pattern for monitor
might be worth having.
For custom/local robots, the documented way to handle them is to include them in a SkipUserAgent
directive, or locally modify robots.pm
: https://www.awstats.org/docs/awstats_config.html#SkipUserAgents
It would be nice if there was a way to avoid directly modifying robots.pm
. Possibly this could be done via a plugin or new directive.
@chuckhoupt The SkipUserAgents
option sounds perfect, thank you! I now have these options set:
SkipUserAgents="Hetzner_System_Monitoring"
OnlyUserAgents=""
LogFormat=1
I noticed in the Docs it says that it only affects future logs that will be processed.
I tried to regenerate the stats using this tool in Plesk:
plesk sbin statistics --generate-all-webstat
plesk sbin statistics --calculate-all
However it is still showing up in Unknown Browsers:
After looking at the Apache access log, it seems that AWStats is replacing spaces with underscores in the user agent:
213.133.113.83 - - [23/Jul/2024:13:49:18 +0100] "GET / HTTP/1.1" 301 162 "-" "Hetzner System Monitoring"
I would try with spaces instead of underscores, but it seems that SkipUserAgents
uses the space character as a separator.
I just tried this RegEx but it still shows up in AWStats unfortunately:
SkipUserAgents="REGEX[Hetzner\sSystem\sMonitoring]"
I also tried:
SkipUserAgents="REGEX[.*Monitoring.*]"
And:
SkipUserAgents="REGEX[Monitoring]"
I also tried to manually regenerate the stats using this tool in Plesk, but it didn't help.
On the plus side, it does appear to have prevented it from showing up in future stats. I can see that this useragent has not visited the site for about an hour, when I know it accesses our site every 5 minutes - so it looks like it should be hidden next month :)
I have left the option set as:
SkipUserAgents="REGEX[Hetzner\sSystem\sMonitoring]"
Hi guys
Please could you add this to robots? It is our server provider's monitoring service and it generates a lot of hits.
User-Agent:
Hetzner_System_Monitoring
Thanks!