JayBizzle / Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
https://crawlerdetect.io
MIT License
1.99k stars 259 forks source link

MicroMessenger bot #358

Closed MikeVL closed 4 years ago

MikeVL commented 4 years ago

UserAgent

Mozilla/5.0 (Linux; Android 7.0; FRD-AL00 Build/HUAWEIFRD-AL00; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.49 Mobile MQQBrowser/6.2 TBS/043602 Safari/537.36 MicroMessenger/6.5.16.1120 NetType/WIFI Language/zh_CN
JayBizzle commented 4 years ago

Hi @MikeVL

If you can resolve my comment above, i'll merge this ASAP.

👍

dmyers commented 4 years ago

Are you sure this is an actual bot? I have registered users with application sessions attached to clicks with this user agent which it seems like this could be some kind of browser or something because of the session data. It appears there is a browser inside the app named WeChat that this could be actually not crawlers at all? https://stackoverflow.com/questions/25174582/is-it-possible-to-target-the-user-agent-string-for-wechats-built-in-browser-on

JayBizzle commented 4 years ago

@dmyers looks like you may be correct. Would you like to create a PR to rectify this?

Thanks 👍

tractorcow commented 4 years ago

I also ran into this issue, and can confirm that this is treating WeChat mobile browser as a crawler.

I wonder if exclusions, crawlers could be injectable so we could adjust them on a case by case basis? I would have to subclass CrawlerDetect if I were to add a manual exclusion.

tractorcow commented 4 years ago

I actually don't think excluding MicroMessenger from the crawler list would be wise, despite it hitting regular traffic. A lot of badly behaved bots do mimic that user agent, so I would probably agree with the status quo for the default crawler list.

However, in cases where wechat browsers should be selectively supported, it makes sense to allow a per-application exclusion list.

I'll PR a feature shortly.

tractorcow commented 4 years ago

PR at https://github.com/JayBizzle/Crawler-Detect/pull/384