calcom / cal.com

Scheduling infrastructure for absolutely everyone.
https://cal.com
Other
31.05k stars 7.47k forks source link

[CAL-1739] ability to add captcha to booking page to prevent spam #9044

Closed PeerRich closed 3 months ago

PeerRich commented 1 year ago

a user is reporting being booked by bots

we should consider adding some sort of captcha / rate limiting to prevent spam bookings

CAL-1739

Bhavyaatrivedi commented 1 year ago

Hi, I want to work on this issue.

aashish0909 commented 1 year ago

Hey @PeerRich We could use Google's reCaptcha or hCaptcha or maybe Cloudflare's Turnstile(captcha-free) services. We could also use Upstash's Rate Limit SDK.

Any specific preference regarding the implementation?

PeerRich commented 1 year ago

is there a free service / open source?

PeerRich commented 1 year ago

if commercial we should add these services optionally via the app store

scshiv29-dev commented 1 year ago

isn't hcaptcha a free service ? @PeerRich . I integrated my fork with hcaptcha and the code works now without much hassle .

PeerRich commented 1 year ago

ah nice!

jatinkh25 commented 1 year ago

Hi @ scshiv29-dev, hCAPTCHA is free but for a limited number of calls. On checking their website they haven't mentioned any call limits in their free tier plan, but they have 1 million calls per month limit in their pro plan.

I was preparing a documentation comparing between different solutions for bot and spam detection. Here is the comparison of Google reCAPTCHA, hCAPTCHA, Cloudflare Turnstile, and Upstash's rate limiter :

Feature Google reCAPTCHA hCaptcha Cloudflare Turnstile Upstash's rate limiter
Working In v2 of reCAPTCHA, users are asked to identify images or audios, that doesn't provide a good user experience. However, in v3 users doesn't have to do this but this implementation has a problem, which is discussed below. Uses techniques like image recognition, audio recognition, and user behavior analysis to distinguish between humans and bots. hCaptcha is generally considered to be just as effective as reCAPTCHA, but it is more user friendly. Uses a variety of techniques to distinguish between humans and bots, including IP address analysis, user behavior analysis, and machine learning. Turnstile does not ask users to identify images or audios, so there are no privacy concerns. ( as per Cloudfare ). It also does not ask users for identifying images or audios but takes a different approach by allowing us to define and manage rate limits based on IP addresses, user identities, or any custom criteria we require.
Cost Free till 1,000,000 calls / month, $1 / 1000 calls after 1 million calls Number of free calls not mentioned but 1 million calls/month on pro plan charging $99 per month billed annually or $139 per month billed monthly. Currently in beta, free 1 million calls / month for more we have to get the enterprise plan. SDK is free, we only need to pay for the underlying database used to store the limits and analytics data. Their Redis free tier allows up to 10,000 requests per day for free, and if we need more, we can upgrade for just $0.20 per 100,000 requests.

The Problem with Google reCAPTCHA v3

ReCAPTCHA v3 returns a score for each request that represents the likelihood that the request originated from a bot or from a human. You can use this score to decide how to respond to the request, such as giving the user access to the requested resource, asking the user to solve a reCAPTCHA, or blocking the user. The stricter you make your thresholds, the more likely you are to block actual users, but the less likely you are to allow bots. The looser your thresholds, the more likely you are to allow bots, but the less likely you are to block actual users. The tricky part is that you need to find a balance between these two extremes that work best for your website. As per my knowledge, this is a trial and error task and the threshold value is decided by performing the action a lot of times through different methods or ways. There would still be a descent amount of chance of calculating a 'not so good' threshold value.

ReCAPTCHA vs hCAPTCHA

The main difference between Google reCAPTCHA and hCAPTCHA is that hCAPTCHA is more privacy-friendly than reCAPTCHA. However, reCAPTCHA is more mature as is relatively older.

Why no Rate Limiters?

Rate limiters are not effective against spam bot detection because they can be easily bypassed by bots. Bots can simply wait for a period of time before making another request, which will not trigger the rate limiter. Additionally, bots can be programmed to use different IP addresses, which will also prevent them from being blocked by a rate limiter.

Here are some reasons why rate limiters are not effective against spam bot detection:

My Views

ReCAPTCHA and hCAPTCHA both offer traditional CAPTCHA services for bot detection. ReCAPTCHA has threshold value problem as discussed above. hCAPTCHA on the other hand doesn't offer as many free calls as the other solutions. A rate limiter approach on the other hand doesn't solely fulfill our requirement of bot detection. So after careful consideration, I believe that Cloudflare Turnstile is the best option for our project. It is free, accurate, scalable, and does not raise any privacy concerns.

These are just my views regarding this. My whole motive behind this is to help the person whosoever is working on this. Please feel free to comment on which solution you think would be the best fit for cal.com.

Thankyou

aashish0909 commented 1 year ago

Even Cloudflare Turnstile has a limit to no. of requests to 1 million calls per month.

​​Availability

Turnstile is currently in open beta and is available as a free tool for all customers. For the beta, customers are limited to 1 million calls to the site to verify the verification endpoint per month per site. Customers who need additional requests can upgrade to Enterprise Bot Management.

jatinkh25 commented 1 year ago

Even Cloudflare Turnstile has a limit to no. of requests to 1 million calls per month.

​​Availability

Turnstile is currently in open beta and is available as a free tool for all customers. For the beta, customers are limited to 1 million calls to the site to verify the verification endpoint per month per site. Customers who need additional requests can upgrade to Enterprise Bot Management.

Ya, but this limit lies in the free tier plan, for more, we have to buy the Enterprise plan, whereas, in hCAPTCHA, 1 million calls limit is offered under a paid plan.

scshiv29-dev commented 1 year ago

@jatinkh25 that's a very detailed comparison I actually was not able to find any pricing option for hCaptcha and just straight up implemented it .I will push it on my fork tomorrow.

piyushgarg-dev commented 1 year ago

Hi, Please Check Issue #9294 and PR #9295

@PeerRich