ProtonMail / WebClients

Monorepo hosting the proton web clients
GNU General Public License v3.0
4.53k stars 571 forks source link

[Security and GDPR Issue] ProtonMail includes Google Recaptcha for Login, every single time. #242

Closed cookiengineer closed 3 years ago

cookiengineer commented 3 years ago

Description:

A recent change over the course of the last two weeks led to re-visiting, re-logging-in users. Recaptcha is now injected and compromising a machine's identity on every single login; especially so if cookies are deleted afterwards to preserve user privacy.

Steps to reproduce the behavior:

Expected behavior:

As a project/company that was founded as an immediate response to the Snowden Leaks, which revealed that the Google PREFs cookie is literally how the NSA tracks users across the planet, I find this very absurd to see.

I understand that there's intention to lower the rate of spammer accounts in the Registration process. But reoccuring users that have -TWO- passwords to identify themselves with should not need to re-identify themselves as a human. And especially not with an unethical service such as Google that seem to not respect any privacy laws that are applicaple in the European Union.

To be honest, this issue is for me a reason to change services; and I feel betrayed in the sense that I as a crowdfunding campaign sponsoring user think that this is a serious breach of GDPR law. I'm a European citizen (from Germany) and I never agreed to share any information with Google.

I also understand that other Recaptcha using services are necessary when ProtonMail would face lots of TOR traffic (which actually would also endanger journalists abroad btw). But this web traffic was received by ProtonMail without any Proxy in between, from my ISP's geo-ip-confirmable IP.

Currently, if ProtonMail continues to deanonymize its users by including Google's Recaptcha code, I cannot recommend ProtonMail as a service to anyone anymore.

edit: I wanted to clarify the narrative that ProtonMail tries to make. This Captcha appeared AFTER I entered the correct password for my login, and AFTER I entered the correct password for mailbox decryption. After clicking through three almost unsolvable captchas, I was led straight to the Inbox view.

This was no anti-bruteforce measurement. This was no anti-credentials-stuffing measurement. This was a false positive in classifications by IPv4 (as I have an ISP that shares their IPv4, as all customer hardware uses IPv6 primarily) (read below to what I think can be done to help mitigate this problem).

chris-aeviator commented 3 years ago

I was also very confused and concerned by this very obvious design failure, given that protonmail should have the in-house resources to prevent signup spam/attacks with their own code

bartbutler commented 3 years ago

Thank you for your feedback. A few comments about this.

  1. A very small fraction of logins get the CAPTCHA challenge. We, and other services, face unrelenting brute force attacks on our login endpoints. If you are seeing a CAPTCHA on login, chances are that something about your connection is suspicious to our system. It's far from perfect, and we continue to improve it, but at most a percent or two of users are seeing CAPTCHA at any time.

  2. The CAPTCHA is run in an iframe on a separate domain to sandbox it from the Proton login flow prevent it from compromising the webapp. Obviously Google still gets some information, but we do all we can to limit this.

  3. CAPTCHAs are very hard to build, especially considering Google has a habit of clearing the field with it's own captcha-breaking code. Most companies do not have the resources to build their own. We had an alternative CAPTCHA we were going to use as a replacement a few years ago and then the company behind it went bankrupt. We are currently looking to replace ReCAPTCHA with hcaptcha, which should alleviate some of these problems.

  4. We have other strategies which we are also exploring to try to reduce the need for CAPTCHAs entirely, but these are also not trivial to build and integrate into all clients.

TL;DR It's a small fraction of users who are affected, it's necessary to protect our users from brute force login attacks, we don't like it either and are working hard on replacements.

cookiengineer commented 3 years ago
  1. If you are seeing a CAPTCHA on login, chances are that something about your connection is suspicious to our system. (...)

If an Incognito Mode Web Browser with a graphical user that actively moves his mouse, types in the password in non-automated manner, key by key - not copy/pasted and not auto-inserted within a millisecond - seems suspicious by your system - then I have to say your ways of identifying or classifying suspicious behaviour is very flawed. There are way better, already solved ways, to do this.

  1. The CAPTCHA is run in an iframe on a separate domain to sandbox it from the Proton login flow prevent it from compromising the webapp. Obviously Google still gets some information, but we do all we can to limit this. (...)

My cached source files served from gstatic.com was E-Tagged permanently by Google. So nope, ain't gonna help anyhow, apparently. You can reproduce this by inspecting served cookies, localStorage, and the cached E-Tag header that is now permanent for each request done to Google's gstatic.com URL in the future.

  1. (...)

I think that a captcha is the wrong solution to the problem at hand. There are way better ways to identify or classify human UX behaviours with all the possibilities of DOM events. Also: Spammers are able to solve captchas, too, and they're human, too. So what's the benefit of this again?

If you're arguing with TOR DDoS scenario traffic: Well, the traffic is done anyways at that point in time, so it won't make a difference regarding server load anyways.

I'm just saying that this only a valid defense against a "python script kiddie running curl" problem, but won't even come close to defending against a headless Browser which most of the professional spammers use.

A captcha won't make ANY difference at all, except worsening the experience for loyal users. You should start to measure actual A/B network pcap streams and logs in order to confirm or disprove the difference the captcha made, because I think it's a false statement to begin with.

Honestly, using Google Recaptcha is just the laziest answer to a problem that wasn't understood in the first place.

willfarrell commented 3 years ago

Hacker News Comments: https://news.ycombinator.com/item?id=27326243

bartbutler commented 3 years ago
  1. If you are seeing a CAPTCHA on login, chances are that something about your connection is suspicious to our system. (...)

If an Incognito Mode Web Browser with a graphical user that actively moves his mouse, types in the password in non-automated manner, key by key - not copy/pasted and not auto-inserted within a millisecond - seems suspicious by your system - then I have to say your ways of identifying or classifying suspicious behaviour is very flawed. There are way better, already solved ways, to do this.

It's based primarily on IP/subnet reputation.

  1. The CAPTCHA is run in an iframe on a separate domain to sandbox it from the Proton login flow prevent it from compromising the webapp. Obviously Google still gets some information, but we do all we can to limit this. (...)

My cached source files served from gstatic.com was E-Tagged permanently by Google. So nope, ain't gonna help anyhow, apparently. You can reproduce this by inspecting served cookies, localStorage, and the cached E-Tag header that is now permanent for each request done to Google's gstatic.com URL in the future.

Sure, Google is going to see the requests, no doubt.

  1. (...)

I think that a captcha is the wrong solution to the problem at hand. There are way better ways to identify or classify human UX behaviours with all the possibilities of DOM events. Also: Spammers are able to solve captchas, too, and they're human, too. So what's the benefit of this again?

We get millions of fraudulent automated login attempts from residential IPs every day. CAPTCHA stops them cold.

If you're arguing with TOR DDoS scenario traffic: Well, the traffic is done anyways at that point in time, so it won't make a difference regarding server load anyways.

I'm just saying that this only a valid defense against a "python script kiddie running curl" problem, but won't even come close to defending against a headless Browser which most of the professional spammers use.

It most certainly does work against headless browsers.

A captcha won't make ANY difference at all, except worsening the experience for loyal users. You should start to measure actual A/B network pcap streams and logs in order to confirm or disprove the difference the captcha made, because I think it's a false statement to begin with.

We have data--we would never use such a solution without the data to measure it. And thus far it is working, really really well.

Honestly, using Google Recaptcha is just the laziest answer to a problem that wasn't understood in the first place.

I assure you that we understand the problem, the possible solutions, and the various compromises associated with them quite well, and also have the data to make informed decisions about them.

theamanbhargava commented 3 years ago

Switching over to hCaptcha soon would be much appreciated. Thanks.

VictorTaelin commented 3 years ago

I'm not a ProtonMail user so I may be misunderstanding the issue here, and sorry if that's the case, but if the problem is users brute-forcing logins, have you considered demanding a ~1 second proof of work per attempt? If you don't need to know it is a human, you just need to considerably limit the spam rate, that may suffice?

cookiengineer commented 3 years ago

We get millions of fraudulent automated login attempts from residential IPs every day. CAPTCHA stops them cold.

Ever considered these might be false positives? Well, and just annoyed users?

"How" do you classify human users? And "how" do you classify successful and unsuccessful automated login attempts? Did you consider that the feature extraction process might be wrong?

I can speak for myself that in my case, the captcha was doing a wrong assumption and contributed a false positive to your dataset. And after three times trying to solve it, I just gave up and said fuck it, not gonna check my emails for today.

I think lots if not most of your users will react the same way when they get blocked out by impossible-to-solve captchas.

kescherCode commented 3 years ago

We get millions of fraudulent automated login attempts from residential IPs every day. CAPTCHA stops them cold.

No. It mainly "stops" legitimate users. Whatever metric you're using to claim that those are fraudulent automated attempts is probably flawed. Especially when some users are Firefox users, because Google loves throwing CAPTCHAs at those. Source: I am a Firefox user.

If you really insist on using CAPTCHAs, at least try hCAPTCHA in the meantime. Still an awful solution, but slightly less awful.

bartbutler commented 3 years ago

I'm not a ProtonMail user so I may be misunderstanding the issue here, and sorry if that's the case, but if the problem is users brute-forcing logins, have you considered demanding a ~1 second proof of work per attempt? If you don't need to know it is a human, you just need to considerably limit the spam rate, that may suffice?

We have a rudimentary one already in the form of bcrypt, but yes, having something like this be scalable and targetable is something we are looking at.

bartbutler commented 3 years ago

We get millions of fraudulent automated login attempts from residential IPs every day. CAPTCHA stops them cold.

Ever considered these might be false positives? Well, and just annoyed users?

"How" do you classify human users? And "how" do you classify successful and unsuccessful automated login attempts? Did you consider that the feature extraction process might be wrong?

I can speak for myself that in my case, the captcha was doing a wrong assumption and contributed a false positive to your dataset. And after three times trying to solve it, I just gave up and said fuck it, not gonna check my emails for today.

I think lots if not most of your users will react the same way when they get blocked out by impossible-to-solve captchas.

The login attempts we are discussing are not false positives. There is no doubt that the CAPTCHA challenge hits some legitimate users, because legitimate users share subnets with those being used for the brute-forcing. We reduce this as much as we can (as I said, the challenge rate is a percent or two) but it's not perfect.

bartbutler commented 3 years ago

We get millions of fraudulent automated login attempts from residential IPs every day. CAPTCHA stops them cold.

No. It mainly "stops" legitimate users. Whatever metric you're using to claim that those are fraudulent automated attempts is probably flawed. Especially when some users are Firefox users, because Google loves throwing CAPTCHAs at those. Source: I am a Firefox user.

If you really insist on using CAPTCHAs, at least try hCAPTCHA in the meantime. Still an awful solution, but slightly less awful.

We look at success/failure rates for both fraudulent and legitimate attempts, and we also monitor customer service reports regarding this to gauge the level of inconvenience to users. The login attempts we are talking about absolutely are fraudulent. CAPTCHA does inconvenience some legitimate users, which we try to minimize as much as we can, and will continue to both improve it and come up with less intrusive/annoying ways to combat this issue. But we wouldn't do it if it wasn't necessary. We are working on hcaptcha already as a replacement.

NeutralKaon commented 3 years ago

I hate gCaptcha. For whatever reason, Google has taken a dislike to my subnet, IP, and browser plugins that selectively disable JS execution. The net result is that I see a lot of captchas, and, if I'm not logged in to google at the time (viz: using the same variety of container tab as my gmail account…) then I will get the 'this is hard mode' google captcha. The kind where you have five challenges, and then are told 'sorry, try again'.

One of your earlier comments was that headless browsers were desirable to stop. I have links, lynx and elinks installed on all my machines -- and would love to actually use them more. We're not all bots!

diamondavocado commented 3 years ago

A privacy-focused company shouldn't be outsourcing any security- or authentication-related stuff to third parties, least of all Google, whose entire reason for existence is to harvest user data. Come on.

kescherCode commented 3 years ago

The login attempts we are talking about absolutely are fraudulent.

By what metric(s)?

Keep in mind that bruteforcers/credential stuffers will definitely just pay CAPTCHA solver services whose workers are being paid a very low hourly wage to solve CAPTCHAs. A way to slow them down a little and increase their required monetary effort, I suppose, but at the expense of a lot of users. Definitely more than just "some legitimate users". I see these CAPTCHAs everywhere I go, simply due to the fact I don't use a Chromium-based browser.

And I'd like to chime in with @NeutralKaon: Headless browsers != illegitimate users.

darkBuddha commented 3 years ago

In the time of (nearly) ubiquitous 2FA, is protecting users from brute force really necessary?

bartbutler commented 3 years ago

The login attempts we are talking about absolutely are fraudulent.

By what metric(s)?

Keep in mind that bruteforcers/credential stuffers will definitely just pay CAPTCHA solver services whose workers are being paid a very low hourly wage to solve CAPTCHAs. A way to slow them down a little and increase their required monetary effort, I suppose, but at the expense of a lot of users. Definitely more than just "some legitimate users". I see these CAPTCHAs everywhere I go, simply due to the fact I don't use a Chromium-based browser.

And I'd like to chime in with @NeutralKaon: Headless browsers != illegitimate users.

Anti-abuse is basically the one place where security through obscurity works. I can't tell you how we know, but we do.

bartbutler commented 3 years ago

In the time of (nearly) ubiquitous 2FA, is protecting users from brute force really necessary?

Yes, because 2FA is not nearly as ubiquitous as you assume.

bartbutler commented 3 years ago

I hate gCaptcha. For whatever reason, Google has taken a dislike to my subnet, IP, and browser plugins that selectively disable JS execution. The net result is that I see a lot of captchas, and, if I'm not logged in to google at the time (viz: using the same variety of container tab as my gmail account…) then I will get the 'this is hard mode' google captcha. The kind where you have five challenges, and then are told 'sorry, try again'.

One of your earlier comments was that headless browsers were desirable to stop. I have links, lynx and elinks installed on all my machines -- and would love to actually use them more. We're not all bots!

I have nothing against text-only browsers, I've spent a decent amount of time in lynx and the like myself. But nothing that doesn't run scripts is going to work with the web version of ProtonMail (the crypto requires JS), so CAPTCHA is not making any difference for the script requirement. What we were referring to was not text browsers but instead are bots running Selenium, etc, that do run JS, and try to break into accounts in an automated way, but don't have an UI at all because they are not interfacing with humans.

theamanbhargava commented 3 years ago

@bartbutler can you give us a timeline on the shifting to hCapctha please, I think that’s the only solution that works for all parties. Thanks.

bartbutler commented 3 years ago

@bartbutler can you give us a timeline on the shifting to hCapctha please, I think that’s the only solution that works for all parties. Thanks.

We've gotten into trouble offering timelines in the past but that said we expect to start testing it within weeks.

theamanbhargava commented 3 years ago

@cookiengineer all of us absolutely did agree to Proton Mail using reCaptcha as part of the privacy policy. I don’t like the solution one bit, but we still agreed to it. https://protonmail.com/privacy-policy

Krixa commented 3 years ago

I dont understand this only 1% of users are affected, when we look at performances issues we care only about the 99th percentile..

hakusaro commented 3 years ago

Yes, because 2FA is not nearly as ubiquitous as you assume.

I'm not quite sure I understand why a product that pitches security wouldn't already implement something like a mandatory 2FA policy, particularly for accounts that are being frequently attacked. Osu!, a popular rhythm game implements 2FA for dangerous account actions, and this can't be turned off.

Recaptcha is a great low-effort solution, but off-the-shelf GCRA libraries exist in many languages, and at least one of your competitors (HEY), for all their flaws, of which there are many, implements mandatory 2FA seemingly quite successfully.

It’s a bit of a hassle to set up, and it comes with the risk that you could lose that “something you have” key, which requires a lengthy, annoying reset process. But given how important the security of your email is, it’s worth the hassle.

It seems to me like the problem here is that Proton Mail is pitching itself as security focused, while also being extremely contradictory in behavior?

darkBuddha commented 3 years ago

In the time of (nearly) ubiquitous 2FA, is protecting users from brute force really necessary?

Yes, because 2FA is not nearly as ubiquitous as you assume.

I don't assume it to be ubiquitous in the general population, only in the "high-profile" accounts that don't want to be hacked.

I assume that serious brute-force attacks are not evenly distributed.

High-profile accounts have a higher chance of being attacked, but also a higher chance that the owner knows and cares about 2FA.

Isn't password strength and uniqueness an individual decision?

bartbutler commented 3 years ago

Yes, because 2FA is not nearly as ubiquitous as you assume.

I'm not quite sure I understand why a product that pitches security wouldn't already implement something like a mandatory 2FA policy, particularly for accounts that are being frequently attacked. Osu!, a popular rhythm game implements 2FA for dangerous account actions, and this can't be turned off.

Recaptcha is a great low-effort solution, but off-the-shelf GCRA libraries exist in many languages, and at least one of your competitors (HEY), for all their flaws, of which there are many, implements mandatory 2FA seemingly quite successfully.

It’s a bit of a hassle to set up, and it comes with the risk that you could lose that “something you have” key, which requires a lengthy, annoying reset process. But given how important the security of your email is, it’s worth the hassle.

It seems to me like the problem here is that Proton Mail is pitching itself as security focused, while also being extremely contradictory in behavior?

Even if we did mandatory 2FA tomorrow, which would be great for average account security but has a lot UX implications, it doesn't change the fact that we have 50M+ accounts today and many don't have 2FA. This particular discussion isn't about a security problem, this is a problem that many people do not want their devices making any API calls to Google whatsoever, which we understand, which is why we try to minimize them as much as possible.

bartbutler commented 3 years ago

In the time of (nearly) ubiquitous 2FA, is protecting users from brute force really necessary?

Yes, because 2FA is not nearly as ubiquitous as you assume.

I don't assume it to be ubiquitous in the general population, only in the "high-profile" accounts that don't want to be hacked.

I assume that serious brute-force attacks are not evenly distributed.

High-profile accounts have a higher chance of being attacked, but also a higher chance that the owner knows and cares about 2FA.

Isn't password strength and uniqueness an individual decision?

Those assumptions are incorrect. The brute-force operators we are discussing here largely do not care which accounts they manage to break into. These are not targeted attacks. They are about volume. What they use them for varies, phishing campaigns, third-party account signup, spam, scams, etc, and usually they are resold. Accounts with a "history" are more valuable that ones without, and given the verification requirements of account creation, login compromise is apparently also a viable strategy. They try as many passwords from leaked password lists as they can, to as many accounts as they can, in the hope that they get lucky. The more attempts they get, the more likely that is, and while we can nudge people to choose better passwords, at the end of the day there are a lot of accounts out there with weak passwords which we have to protect. The way to do that, statistically, is to slow down attempts, which is something CAPTCHA helps with, though we try to use it in a targeted way to minimize the impact on legitimate users.

It's also worth emphasizing that large numbers of compromised accounts are a systemic threat to the service itself and to legitimate users, even for those users who don't get hacked. If someone uses thousands of accounts to bootstrap fake Facebook profiles, and Facebook responds by banning signup with protonmail.com email addresses, that's a huge problem for legitimate Proton users. Same thing for spam, domain reputation is really important for delivery. So this is not the kind of thing that we as the service can say "not our problem, they should have chosen better passwords", because if it ever gets out of control it could be an existential problem for us and every Proton user.

ghost commented 3 years ago

Sure, Google is going to see the requests, no doubt.

API calls to Google ... which is why we try to minimize them as much as possible.

I don't think you're (fully) grasping the severity of the problem with this. I'm a paying Proton(Mail) customer and I got an account with you to get away from G👀gle. I also use the #GoogleIsEvil hashtag whenever I talk about them (and #GoogleIsSpyware). And I severely limit myself for absolutely insane shit, even by G👀gle's standards. And that's saying a lot.

Proton(Mail) promotes itself as a pro-privacy company. For me and I assume pretty much anyone who cares about privacy, that means No G👀GLE WHATSOEVER. Not minimize. ZERO.

It's like you ('guys') have no clue whatsoever who your (potential) customers are. Even considering using anything by G👀gle is a black mark. Use it and Proton gets a big black box over it with a big red cross as in "Do not touch. Evar". And yes, that means I'll cancel my subscription and actively advice anyone I know (or who asks me) to stay the hell away from Proton.

I hope you now will realize the severity.

elmarsto commented 3 years ago

In the time of (nearly) ubiquitous 2FA, is protecting users from brute force really necessary?

Yes, because 2FA is not nearly as ubiquitous as you assume.

Probably because (paying user) I'm still waiting for Yubikey support, I'm just gonna go out on a limb and say that this neatly summarizes ProtonMail's current stance:

"Users don't understand security, and do not need what they do not understand. In reality, no one is very secure, but there's not much anyone can do about it except relax."

Thanks, but I already get that from FANG

bartbutler commented 3 years ago

I hope you now will realize the severity.

We do, and I understand exactly where you are coming from. When we turned this on a few weeks ago, we faced a choice: let adversaries with a seemingly unlimited supply of fresh residential IPs compromise thousands of accounts every day, or use CAPTCHA challenges on a small subset of logins. We chose the latter, it worked, and I still think it was the right call. But nobody is under any illusions that it's even close to an ideal solution, which is why we are also working on alternatives, both integrating another CAPTCHA (hcaptcha) and developing alternatives to CAPTCHAs entirely. But Rome wasn't built in a day and neither is software. We will fix this as soon as we possibly can, but for now CAPTCHA is what's holding back the barbarians at the gates.

bartbutler commented 3 years ago

In the time of (nearly) ubiquitous 2FA, is protecting users from brute force really necessary?

Yes, because 2FA is not nearly as ubiquitous as you assume.

Probably because (paying user) I'm still waiting for Yubikey support, I'm just gonna go out on a limb and say that this neatly summarizes ProtonMail's current stance:

"Users don't understand security, and do not need what they do not understand. In reality, no one is very secure, but there's not much anyone can do about it except relax."

Thanks, but I already get that from FANG

Sorry if that's the vibe you're getting--that is not our intention. I'm just as frustrated about FIDO/U2F support as you are--we built it years ago, but the single-domain restriction in the spec torpedoed the timeline because enabling it would mean that login to both protonmail.com/protonvpn.com would no longer be possible. This roadblock should be removed with moving to a unified domain at some point. To answer the obvious follow-up question, the shared parent domain is important for our implementation of SSO because there is confidential client-side data that can't be shared with the server, a complication others do not have to deal with.

Also, as wonderful as FIDO/U2F/Webauthn support would be, my guess is that it wouldn't move the needle much on overall 2FA rates, at least not to the point that large numbers of accounts without 2FA would not exist, which brings us back to the problem of protecting them from brute-force takeover.

breathebunny commented 3 years ago

We are working on hcaptcha already as a replacement.

Hcaptcha is too easy to crack:

https://greasyfork.org/en/scripts/425854-hcaptcha-solver-automatically-solves-hcaptcha-in-browser/code

cookiengineer commented 3 years ago

@cookiengineer all of us absolutely did agree to Proton Mail using reCaptcha as part of the privacy policy. I don’t like the solution one bit, but we still agreed to it. https://protonmail.com/privacy-policy

Nope, I did not agree to this.

This very "I'll email the customers of a change and if nothing comes back it means I can do everything" policy is the exact reason why WhatsApp is currently being sued by the German state (whereas over 16 European countries have joined the legal fight on the day of its announcement).

GDPR clearly defines what an "opt-in" approach has to look like, and there were several legal cases in the past decided in favor of GDPR. Opt-In means "optionally in", not "in by default". Clearly this implementation and following privacy policy change notification was a GDPR violation.

Please, if you make legal statements - at least read what the legal text says. I'm not expecting you to be a lawyer, but at least you should've read the law you're talking about first. I mean, there's even a nice searchable website available only for that very law you're talking about.

For the sake of completion I'm gonna link the most relevant paragraphs to this matter here:

Also, as wonderful as FIDO/U2F/Webauthn support would be, my guess is that it wouldn't move the needle much on overall 2FA rates, at least not to the point that large numbers of accounts without 2FA would not exist, which brings us back to the problem of protecting them from brute-force takeover.

This is a fight you will lose over time, due to limitated availability of resources as a single entity. A better alternative would be: "Hey your decrypted password was released in the Adobe hack (you can check this on haveibeenpwned.com), please change it to a more secure one to prevent hackers from taking over your account".

Or something along this in the registration process: "Uh oh. Your password doesn't seem strong. Please use more characters, numbers and special characters in them". (Include rockyou.txt and other variants to be on the safe side that even a strong password isn't very very likely to be cracked)

I mean, come on. If you don't actively encourage users to use 2FA and more secure passwords, they'll never know about it. As a company you have to first move, and not try to drag your head out of the mud once it's too bad already.

darkBuddha commented 3 years ago

In the time of (nearly) ubiquitous 2FA, is protecting users from brute force really necessary?

Yes, because 2FA is not nearly as ubiquitous as you assume.

I don't assume it to be ubiquitous in the general population, only in the "high-profile" accounts that don't want to be hacked. I assume that serious brute-force attacks are not evenly distributed. High-profile accounts have a higher chance of being attacked, but also a higher chance that the owner knows and cares about 2FA. Isn't password strength and uniqueness an individual decision?

Those assumptions are incorrect. The brute-force operators we are discussing here largely do not care which accounts they manage to break into. These are not targeted attacks. They are about volume. What they use them for varies, phishing campaigns, third-party account signup, spam, scams, etc, and usually they are resold. Accounts with a "history" are more valuable that ones without, and given the verification requirements of account creation, login compromise is apparently also a viable strategy. They try as many passwords from leaked password lists as they can, to as many accounts as they can, in the hope that they get lucky. The more attempts they get, the more likely that is, and while we can nudge people to choose better passwords, at the end of the day there are a lot of accounts out there with weak passwords which we have to protect. The way to do that, statistically, is to slow down attempts, which is something CAPTCHA helps with, though we try to use it in a targeted way to minimize the impact on legitimate users.

It's also worth emphasizing that large numbers of compromised accounts are a systemic threat to the service itself and to legitimate users, even for those users who don't get hacked. If someone uses thousands of accounts to bootstrap fake Facebook profiles, and Facebook responds by banning signup with protonmail.com email addresses, that's a huge problem for legitimate Proton users. Same thing for spam, domain reputation is really important for delivery. So this is not the kind of thing that we as the service can say "not our problem, they should have chosen better passwords", because if it ever gets out of control it could be an existential problem for us and every Proton user.

Makes sense, thanks for the reply.

I want to throw in my humble opinion that i prefer a 1-click Google captcha to a 10-click "hCaptcha".

almasen commented 3 years ago

@cookiengineer all of us absolutely did agree to Proton Mail using reCaptcha as part of the privacy policy. I don’t like the solution one bit, but we still agreed to it. https://protonmail.com/privacy-policy

Nope, I did not agree to this.

The privacy policy solely states that

Data related to the opening of an account

[...] In order to pursue our legitimate interest of preventing the creation of accounts by spam bots or human spammers [...] You may be asked to verify using either reCaptcha, Email, or SMS.

As of today, Privacy Policy last modified at February 15, 2021

Data collection related to reCaptcha is never mentioned other than for account creation, i.e. signing up.

Hence I do not recall having agreed to reCaptcha data collection related to when I am logging in to my account.

theamanbhargava commented 3 years ago

@almasen you’re right indeed, it’s only mentioned in the account creation section, doesn’t say they will use it when I operate my account. Thanks for correctly me.

bartbutler commented 3 years ago

This is a fight you will lose over time, due to limitated availability of resources as a single entity. A better alternative would be: "Hey your decrypted password was released in the Adobe hack (you can check this on haveibeenpwned.com), please change it to a more secure one to prevent hackers from taking over your account".

Or something along this in the registration process: "Uh oh. Your password doesn't seem strong. Please use more characters, numbers and special characters in them". (Include rockyou.txt and other variants to be on the safe side that even a strong password isn't very very likely to be cracked)

I mean, come on. If you don't actively encourage users to use 2FA and more secure passwords, they'll never know about it. As a company you have to first move, and not try to drag your head out of the mud once it's too bad already.

Those are things we are working on as well. However, it doesn't help for inactive accounts. There is simply no substitute for rate-limiting attempts. It doesn't have to be CAPTCHA--we are working on alternatives to that too. But rate-limiting itself is required.

cookiengineer commented 3 years ago

@bartbutler Technically, if your service would prefer IPv6 over IPv4, all your problems are immediately gone.

These days nobody has their own IPv4 anymore, and most ISPs use the same IPv4 for hundreds if not thousands (aka 65535-2 max) of users. The mentioned "requirement for rate-limiting" isn't necessary if the connection was made via IPv6; as the noise ratio is way lower than with IPv4.

And I'm writing this here so that you fully understand why I'm so skeptical about false positives in your measurements. I didn't have my own IPv4 as a private user since at least 10 years by now. Therefore most if not all correlations about classifying traffic by IPv4 alone are wrong.

ProtonMail also seems to still not have IPv6 support (at least in regards to DNS entries), therefore I know that my "suspicious behaviour" is very likely to be caused by actual other users of the same ISP.

controlfreakstudio commented 3 years ago

I can understand a modicum of the stop gap rationale to contain certain ripple effects. What I can't comprehend why was there no non-google contingency already on the books for the inevitable or perceived need to deploy. Your surface area hasn't been shrinking...

Qix- commented 3 years ago

My biggest concern is the mindset of ProtonMail engineers.

From your page at https://protonmail.com/about:

Screenshot of a snippet from the ProtonMail about page that states "We're building an internet that protects privacy, starting with email." with "protects privacy" highlighted

Screenshot of a snippet from the ProtonMail about page that states "We are scientists, engineers, and developers drawn together by a shared vision of protecting civil liberties online." with "protecting civil liberties" highlighted

If ProtonMail is "building an internet that protects privacy", then why wasn't a proper solution to this problem... well, built? Why instead did the development team reach for the most anti-privacy, anti-civil-liberties company's product first?

This is the alarm raised in my head. I am also a paying customer and will be a little lost if I have to move away because of this (after Tutanota started to cave to subpoenas).

markcellus commented 3 years ago

If ProtonMail is "building an internet that protects privacy", then why wasn't a proper solution to this problem... well, built?

It's kind of surprising that building a custom solution for this wasn't at the very top of priorities in early stages of ProtonMail. Meanwhile, the ProtonMail team seems to be prioritizing things that are much less necessary (like fancy UIs in beta.protonmail.com, ProtonCalendar, etc). If priorities need to be re-aligned, it makes sense to get ProtonMail 's security and privacy 💯 first before moving onto to all the other things.

bartbutler commented 3 years ago

@bartbutler Technically, if your service would prefer IPv6 over IPv4, all your problems are immediately gone.

These days nobody has their own IPv4 anymore, and most ISPs use the same IPv4 for hundreds if not thousands (aka 65535-2 max) of users. The mentioned "requirement for rate-limiting" isn't necessary if the connection was made via IPv6; as the noise ratio is way lower than with IPv4.

And I'm writing this here so that you fully understand why I'm so skeptical about false positives in your measurements. I didn't have my own IPv4 as a private user since at least 10 years by now. Therefore most if not all correlations about classifying traffic by IPv4 alone are wrong.

ProtonMail also seems to still not have IPv6 support (at least in regards to DNS entries), therefore I know that my "suspicious behaviour" is very likely to be caused by actual other users of the same ISP.

You're right, you're a false positive. But it's not necessarily sharing IPs even, it's sharing subnets (so yes, other users on your ISP). We used to just rate limit on IP alone, no captchas, and it worked pretty well. Then the abusers started switching IPs on every attempt, and seem to have unlimited numbers of them. Rate-limiting subnets the same way we did individual IPs was not going to work, we needed a way to let in legitimate users, hence the CAPTCHA for subnets being abused. IPv6 isn't a solution either, because rate limiting by IPs itself no longer works.

bartbutler commented 3 years ago

I can understand a modicum of the stop gap rationale to contain certain ripple effects. What I can't comprehend why was there no non-google contingency already on the books for the inevitable or perceived need to deploy. Your surface area hasn't been shrinking...

There was, and they went bankrupt, and then there is hcaptcha, which is the successor to that and is about to be rolled out for testing. The timing, however, meant we needed a temporary working solution faster, hence the use of reCAPTCHA.

bartbutler commented 3 years ago

If ProtonMail is "building an internet that protects privacy", then why wasn't a proper solution to this problem... well, built? Why instead did the development team reach for the most anti-privacy, anti-civil-liberties company's product first?

We didn't. There were no viable alternatives for a while when we started using CAPTCHA for signup, then we found one but they went bankrupt, and now there's hcaptcha, which we were already integrating but and then we had an emergency and the most viable short-term solution was to use the known-working implementation we already had for signup to buy time for hcaptcha and other solutions.

bartbutler commented 3 years ago

If ProtonMail is "building an internet that protects privacy", then why wasn't a proper solution to this problem... well, built?

This is exactly what I was thinking while catching up on this whole thread. It's kind of surprising that building a custom solution for this wasn't at the very top of priorities in early stages of ProtonMail. Meanwhile, the ProtonMail team seems to be prioritizing things that are much less necessary (like fancy UIs in beta.protonmail.com, ProtonCalendar, etc). If priorities need to be re-aligned, it makes sense to get ProtonMail 's security and privacy 💯 first before moving onto to all the other things.

Modern CAPTCHAs are extremely difficult to build and require extraordinary amounts of resources to stay ahead of automated CAPTCHA solvers, including those published by Google itself. Maybe eventually we'll have to build one ourselves, but this is simply not something we had the resources to do early on and this also wasn't a problem early on (the rise of this brute force attackers with unlimited IPs is something that appeared, for us at least, in the last few months). We also have a steady drumbeat from users who (rightly) demand product features. You can be assured that in hindsight we wish that we had had hcaptcha ready sooner so that we wouldn't have had to use reCAPTCHA in the interim, and that will influence decision-making going forward, but unfortunately there's nothing we can do about past prioritization.

smodnix commented 3 years ago

@bartbutler what is the final verdict please, this discussion cannot go on forever. it seems everything has been stated very clearly now the decision is yours about what to do about it with the all consequences.

millette commented 3 years ago

@smodnix See https://github.com/ProtonMail/WebClient/issues/242#issuecomment-850903199

Qix- commented 3 years ago

this also wasn't a problem early on

Ah yes, because security and privacy as an afterthought is what shareholders like to see, right?

What other security-related things has ProtonMail skimped out on because they aren't a problem right now?

This isn't helping the mindset problem I mentioned before.

goldkehlchen commented 3 years ago

From my point of view it seems that many here try to downplay whatever Proton is saying because they think they are right.

They might have some expertise but certainly no knowledge of the actual facts of Proton’s daily operations.

Nobody has a problem with healthy & constructive discussion. But god damn.

Instead of running after Proton with your pitch forks maybe try to actually put yourself into their situation for a brief moment.

And villifying them with bad faith arguments that they supposedly don’t actually care about their user’s privacy is not only gaslighting but also doesn’t solve anything either.

hakusaro commented 3 years ago

Modern CAPTCHAs are extremely difficult to build and require extraordinary amounts of resources to stay ahead of automated CAPTCHA solvers, including those published by Google itself.

I think you recognize part of the problem, and that's good, but I think you've miscalculated the cost/benefit analysis on this front.

Based on everything I've read, it seems like you have some attacker who has the following attributes:

  1. Has sufficient resources to either operate a large scale residential botnet, or has enough money to borrow the services of a large scale residential botnet.
  2. Likely has a list of compromised passwords from another service.
  3. Has some reason to get into these accounts that is assumably something more valuable than the lulz, because of number one.

If you've deployed recaptcha and that's solved your problem, allow me to congratulate you: you have an attacker who is motivated, but not that motivated. Human-based captcha farms exist, and are a cost effective and time efficient way of solving captcha related problems for attackers who are motivated. If your users are bitcoin billionaires, you will see the attack morph when either they become aware that it's no longer working, or they exhaust other more-easily-attacked targets.

If your issue is that paid accounts are being targeted, you actually do have two-factor authentication: payment methods. If you have renewal revenue and cards on file, you can easily prompt users who don't have 2FA but do have a card on file for card-related information.

If you issue is that free accounts are being targeted, your solutions are a bit limited. You do, of course, have the option of deploying some captcha. I think this thread demonstrates that at least some percentage of users are not okay with this solution. Are these paid users? Free users? Can you disable the captcha for paid users somehow? Can you implement a two-step login flow, where users enter their email and either are given a captcha, or prompted for 2FA, or are more strictly rate limited based on the plan type?

On your frustration with WebAuthn: I totally understand where you're coming from here. WebAuthn has a lot of downsides, and the biggest one is that it makes authentication over two different domains difficult. I suggest exploring either off-the-shelf central authentication services, or build some kind of single "authentication domain" that can implement OpenID Connect or SAML, in the long term.

Obviously, there are no perfect solutions here. I just think that if your product is happy to pitch security as a feature, you should consider more liberal application of creativity. Just a few examples, which may or may not be helpful:

For all you know, it could be an attacker's goal to show that you're deficient on privacy by using recaptcha. Your competitors are getting free marketing collateral by being able to say "we'll do something more clever than implement recaptcha." I'm not saying you need to go all conspiracy theory on this stuff, but as someone who recently had to mitigate a few attacks and didn't use recaptcha, I know that there are good alternatives and strategies you can take.

In the short term, you might be stuck, but long term I'd suggest building out a plan for "quick and easy non-captcha-related solutions" that can be deployed. Assume that either the current or future attacker will be sufficiently motivated that a captcha will be defeated by a captcha farm, and work from there.