JayBizzle / Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
https://crawlerdetect.io
MIT License
1.98k stars 256 forks source link

Some Google bots are not identified #505

Closed anemone-clown closed 5 months ago

anemone-clown commented 1 year ago

Hi, it seems some Google Bot from cloud are not identified as it. Example: IP = 104.199.13.48 | Referer = | Lang = | Host = 48.13.199.104.bc.googleusercontent.com | Nav = Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36 | Translate = 0 | Bot = 0

Because I've a lot of bot, I verify (first) if $_SERVER['HTTP_ACCEPT_LANGUAGE'] is empty, and (second), I do gethostbyaddr($_SERVER['REMOTE_ADDR']). If Google is present in host, it's a google cloud bot.

Is it possible to detect this? Jef (sorry for my poor english...)

clementmas commented 11 months ago

I guess you could add a rule matching googleusercontent. I haven't seen this bot yet.

But recently I'm seeing requests from GoogleOther:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleOther) Chrome/117.0.5938.132 Safari/537.36
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.5938.132 Mobile Safari/537.36 (compatible; GoogleOther)
BartVB commented 5 months ago

Same here:

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.94 Mobile Safari/537.36 (compatible; GoogleOther)

sylvaindeloux commented 5 months ago

Is this package still maintained? Do you know some similar alternatives?

JayBizzle commented 5 months ago

Yes, still maintained

PRs welcome 🙏🏻

BartVB commented 5 months ago

Amazing, thanks a lot, @JayBizzle !