JayBizzle / Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
https://crawlerdetect.io
MIT License
2.01k stars 258 forks source link

Potential bots #353

Open JayBizzle opened 4 years ago

JayBizzle commented 4 years ago
Abhirup-99 commented 4 years ago

Is this merged?

JayBizzle commented 4 years ago

Is this merged?

The user-agents marked with ✅ have been added, the others need adding 👍🏻

newHagen commented 4 years ago

This is the UserAgent of the Google-Weblight bot:

clementmas commented 1 year ago

There's also:

Mozilla/5.0 AppleWebKit/537.36 Chrome/114.0.5735.179 Safari/537.36 Google-Ads-Conversions

Should these 2 existing rules be replaced:

with a simple "Google-Ads" detection?

JayBizzle commented 1 year ago

There's also:

Mozilla/5.0 AppleWebKit/537.36 Chrome/114.0.5735.179 Safari/537.36 Google-Ads-Conversions

Should these 2 existing rules be replaced:

  • Google-Ads-Creatives-Assistant
  • Google-Ads-Overview

with a simple "Google-Ads" detection?

Yeah, go for it 👍

SoranDK commented 1 year ago

Probably no way to detect but these 2 visit my entirely Danish site every day... The first twice a day from the US and the second once a day from China. These are all the useragent headers and all of it seems to be removed via excludes.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

JayBizzle commented 1 year ago

Probably no way to detect but these 2 visit my entirely Danish site every day... The first twice a day from the US and the second once a day from China. These are all the useragent headers and all of it seems to be removed via excludes.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

Yep, pretty annoying bots like this. Nothing this package can do about that 🤔

SoranDK commented 1 year ago

I found this list if anyone's interested in going through it ;-P https://user-agents.net/bots

I don't have enough experience with regex to do it myself sadly... As my original post showed (hadn't noticed the bot I mentioned already would get catched by the "bot" in the regex).

tsawitzki commented 5 months ago

what about the new generation of AI / Machine Learning crawlers/bots like GPTbot? I don't see them listed here, however the demo detection site recognizes it?

JayBizzle commented 5 months ago

what about the new generation of AI / Machine Learning crawlers/bots like GPTbot? I don't see them listed here, however the demo detection site recognizes it?

Can you give me some example user agents

MIJmker commented 5 months ago

I found this repo after running into some issues caused by crawlers, but saw the following not in the list. So since you asked for some examples, here are some of them which crashed our sites :/