JayBizzle / Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
https://crawlerdetect.io
MIT License
1.95k stars 255 forks source link

New Google Bot and Facebook bot Google I/O crawler - Build/MMB29P #532

Open gregzawadzki opened 1 month ago

gregzawadzki commented 1 month ago

Basically according to Google there will be new crawler that uses

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)

For identification.

However, Facebook also started using it now, but does not include information it's a bot.

From our log (for example last entry)

69.171.249.10 - - [28/Jun/2024:09:56:47 +0200] "GET /10-xxxx.html HTTP/2.0" 200 170017 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.10 - - [05/Jul/2024:16:44:55 +0200] "GET /xxxxxx HTTP/2.0" 200 90496 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.10 - - [06/Jul/2024:11:31:11 +0200] "GET /robots.txt HTTP/1.1" 200 3358 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.10 - - [07/Jul/2024:01:47:10 +0200] "GET /xxx/1289-xxx HTTP/1.1" 200 89077 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 6P Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36"
69.171.249.10 - - [07/Jul/2024:12:37:15 +0200] "GET / HTTP/2.0" 200 245820 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.10 - - [07/Jul/2024:12:37:59 +0200] "GET / HTTP/1.1" 200 139925 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 6P Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36"
69.171.249.10 - - [07/Jul/2024:12:37:59 +0200] "GET / HTTP/2.0" 200 139921 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.10 - - [07/Jul/2024:12:38:12 +0200] "GET /nowe-produkty HTTP/2.0" 200 146181 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.10 - - [07/Jul/2024:12:42:37 +0200] "GET /21-xxx.html HTTP/2.0" 200 167864 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.10 - - [07/Jul/2024:12:46:05 +0200] "GET /22-xxx.html HTTP/1.1" 200 171716 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 6P Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36"

Source: https://developers.google.com/search/blog/2019/10/updating-user-agent-of-googlebot?hl=en

gregzawadzki commented 1 month ago

Also Microsoft Bing uses it: https://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0