Closed alanorth closed 4 years ago
Hi @alanorth ,
We added ^User-Agent
today together with other changes in this commit :
https://github.com/atmire/COUNTER-Robots/commit/56cca8f13337e7acb89fdd0270f023ed9919b6f2
Thanks for your input!
Great! Yes you're right, it's better to anchor it as ^User-Agent
. Cheers!
I have thousands of hits in my server logs from clients that have the literal string "User-Agent" in their HTTP
User-Agent
protocol header (sometimes even twice!). For example:User-Agent: Drupal (+http://drupal.org/)
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
User-Agent:Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) IKU/7.0.5.9226;IKUCID/IKU;
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)
User-Agent:User-Agent:Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB7.5; .NET4.0C)IKU/6.7.6.12189;IKUCID/IKU;IKU/6.7.6.12189;IKUCID/IKU;
User-Agent:Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) IKU/7.0.5.9226;IKUCID/IKU;
I don't know about other protocols, but in HTTP there's no reputable client that does this. As far as I can tell these are bots pretending to be real user agents and doing it poorly. In my case, the IPs associated with these are all on Amazon AWS and the requests are clearly not from a human user.
Should we add
User-Agent:
to the list of robots? Let me know what you think.