atmire / COUNTER-Robots

Official list of user agents that are regarded as robots/spiders by COUNTER
MIT License
64 stars 29 forks source link

Four new Bots added #41

Closed mrabro closed 3 years ago

mrabro commented 3 years ago

These bots are link preview generators.

alanorth commented 3 years ago

@mrabro The TwitterBot and TelegramBot user agents would already be matched by the bot pattern on line one of COUNTER_Robots_list.json. The WhatsApp and Trello user agents may be useful though (I'll let other contributors here discuss).

davidatmire commented 3 years ago

I confirm that this pull request is currently under discussion on the COUNTER robots workgroup mailing list.

davidatmire commented 3 years ago

@mrabro

Do you know if Trello or Whatsapp are actually acting as crawlers / bots ?

We would think that these user agents are (only?) used to fetch thumbnails/previews for pages that users link in Trello or Whatsapp. If so, we would think that there is a 1 on 1 link between the behavior of these user agents, and a human user. We couldn't find any information about either Whatsapp or Trello actually crawling like most of the bots/crawls do that we aim to identify & isolate.

Did you have a specific use case in mind for Trello and Whatsapp when you created this pull request ?

With thanks

mrabro commented 3 years ago

@mrabro

Do you know if Trello or Whatsapp are actually acting as crawlers / bots ?

We would think that these user agents are (only?) used to fetch thumbnails/previews for pages that users link in Trello or Whatsapp. If so, we would think that there is a 1 on 1 link between the behavior of these user agents, and a human user. We couldn't find any information about either Whatsapp or Trello actually crawling like most of the bots/crawls do that we aim to identify & isolate.

Did you have a specific use case in mind for Trello and Whatsapp when you created this pull request ?

With thanks

Yes, they are user agents to fetch thumbnails or preview for pages.... I was working on some private links that are supposed to be work only first time, and when any user visits the link with provided token in url, will be expired as soon as url is opened, so my client was trying to share those private links via whatsapp, and due to whatsapp bot visiting that page to generate preview, our links get expired and no more functional to the user to whom that is meant to be.

it is just like facebook/instagram, which is already present in the list of patterns (facebookexternalhit)

davidatmire commented 3 years ago

Addition of Trello and WhatsApp approved by the COUNTER robots workgroup. Treating the PR now.