Spoofing the user agent

JayBizzle / Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

https://crawlerdetect.io

MIT License

2.01k stars 258 forks source link

Spoofing the user agent #258

Closed cr101 closed 6 years ago

cr101 commented 6 years ago

First of all, thank you for your software.

A lot of bots tend to spoof user agents and some do it for legitimate reasons (i.e. they only want to crawl mobile content), while others simply don't want to be identified as bots. Even worse, some bots spoof legitimate/polite bot agents, such as the user agents of google, microsoft and other crawlers which are generally considered polite.

How reliable is detecting bots/crawlers/spiders via the user agent?

JayBizzle commented 6 years ago

As you have correctly pointed out, the user agent can be easily spoofed. Nothing we can do about that.

We do check a few other headers other than the user agent to detect some bots i.e. GoogleBot sometimes spoofs the user agent but identifies itself in the form of the FROM header.

IP address checking would be the next step. We have explored the idea in the past, but decided it would be too hard to maintain.