HaxeFoundation / haxelib

The Haxe library manager
https://lib.haxe.org/
MIT License
173 stars 78 forks source link

Stop using MD5 for passwords #314

Open nadako opened 8 years ago

nadako commented 8 years ago

quoting @ibilon from https://github.com/HaxeFoundation/haxe.org/issues/177:

we also need to upgrade the usage of md5 for haxelib password, it's been broken for so many years, but that's not a server issue.

ibilon commented 8 years ago

To do that we need the password reset functionality, then we could empty the password column and force a password reset for everybody.

nadako commented 8 years ago

What should we use instead anyway? haxe.crypto contains Sha1, Sha224 and Sha256

ibilon commented 8 years ago

Sha256 is still safe so it should be good, and with the random salt we shouldn't see a crack any time soon.

andyli commented 8 years ago

Looks like sha256 + salt is not good enough for passwords => http://security.stackexchange.com/questions/90064/how-secure-are-sha256-salt-hashes-for-password-storage?lq=1

According to another SO question, we probably want to use PBKDF2-Haxe.

ibilon commented 8 years ago

I knew about PBKDF2 but I assumed it wasn't available in haxe, in that case yeah we should go with that one.

tobil4sk commented 2 years ago

It would be good to come back to this, MD5 really shouldn't be used anymore. This should be considered a major security concern for the Haxe ecosystem, especially as it is growing.

tobil4sk commented 2 years ago

Instead of forcing everyone to reset their passwords, we could temporarily rehash the existing passwords with a new algorithm and then update them properly when the user next logs in: https://paragonie.com/blog/2016/02/how-safely-store-password-in-2016#legacy-hashes

There is however a problem with how to handle old clients... right now, the client hashes the password with md5 before sending it off to the server, instead of the password being hashed on the server side. We might have to simply reject haxelib register and haxelib submit when they are being executed from an old client (and give an error telling the user how to update haxelib).

We could do this by adding an extra parameter to the register and processSubmit remoting functions, which will be null if the client is old and have a proper value with new clients, however, this feels messy and isn't very future proof. Alternatively, we could update the api version again. @andyli is there a preferred way of doing this?

andyli commented 2 years ago

My idea right now:

P.S. I've no experience in handling passwords in a system. Feel free to comment.

tobil4sk commented 2 years ago

I don't really see the point in having a transition period like that. There would be no extra security provided by the other hash during this period, as cracking the MD5 hash would also crack the new algorithm since the password was the same. Until the year is up and the md5 hashes are removed, there is no benefit, but the registrations would still be broken in return for no extra added security. I think it's better just to break compabitilty at the same time as rolling out the change.

So this is what I think would be good:

1) The current hashes are simply rehashed using the new algorithm (security is improved a bit immediately for all accounts, although this can only be temporary as it has some problems in itself) 2) Registrations and submissions are broken for clients connecting to version 3.0 of the api. These clients receive errors with instructions on how to update (haxelib update haxelib), however they can run all other commands as normal. 3) A new version of the api (4.0) is made (with the database still remaing at 3.0), which takes in the raw passwords instead of the md5 hashes 4) Whenever a user logs in, if the hash stored is still the rehashed md5, the password is hashed using both algorithms and if it matches, the hash is updated properly to remove the md5 step (we can keep track of this via a hashmethod field) 5) Perhaps we can reset passwords for accounts that don't log in after an amount of time, as rehashing an unsalted md5 brings its own security concerns and should not be a permanent solution.

I think it's fair to have these breaking changes, as it's easy to update haxelib and the current situation of hashing on the client is extremely insecure. Therefore, we should prevent library authors from using these old clients to register/submit when it can easily be avoided by updating their haxelib. Also, it will have no effect on users who aren't library authors.

I've started working on a binding for argon2 and some of these changes, so I can make a PR soon to give a better idea of how it might work.

andyli commented 2 years ago

There would be no extra security provided by the other hash during this period, as cracking the MD5 hash would also crack the new algorithm since the password was the same.

Once a user have their new password hash stored in the DB, the old md5 wouldn't be used anymore.

But I'm fine with disabling lib submission with the old hash anytime.

  1. A new version of the api (4.0) is made (with the database still remaing at 3.0), which takes in the raw passwords instead of the md5 hashes

Letting the server able to read raw passwords is a bad idea. Passwords will be leak if the server provider/maintainer (e.g. Digital Ocean, or myself) is a bad actor. The server should only receive hashed passwords and the hashing is performed on user's machine.

tobil4sk commented 2 years ago

Once a user have their new password hash stored in the DB, the old md5 wouldn't be used anymore.

That's possible to do with the plan I sent above, if we accept submissions from old clients. However, there is an issue with this.

Let's suppose we want to implement this, so it's possible to use the md5 to login with old clients until the new hash is stored. We'll need a way to keep track of whether the hash has been updated (either with a hashmethod field, or your method of having two password fields, though I would argue the former is a cleaner, more future proof solution). If the hash has been updated fully, then we need to reject the client's request to log in, because there is no longer any way we can validate the md5 the old client sends. If the server stored hash hasn't been updated, we would need to now check the hash sent by the client, before knowing whether it matches.

Now, theoretically, there would be two ways an old clients submission request can fail: either the account no longer uses md5, so the request is outright rejected, or the account still uses it but the password is incorrect. The problem lies in how to handle these two errors.

We can avoid this problem by disabling submissions from old clients immediately, and that also gives us the freedom to implement a proper solution and not have to carry around a legacy field in the remoting api. So I do think doing that is the best solution.

Letting the server able to read raw passwords is a bad idea.

It doesn't really make a difference if the server receives a hashed password or the plain password. If the server is compromised in this way, even if the password is hashed on the client, then the bad actor could as well see the hash, which essentially acts as the password itself, and knowing the hash is enough to give them access to the account (using a custom client). Note that the password would still be hashed before being stored anywhere long term. Also, for the client to hash the password, there would have to be public access to any user's salt, which otherwise would be private apart from in the case of a database leak.

Client-side hashing is not a standard practice, and overall it leads to much worse security than server side hashing:

andyli commented 2 years ago

I think the conclusion is that we need hashing on both client side and server side. This is also suggested in the discussions in the stackexchange pages you referenced.

kLabz commented 2 years ago

I think the conclusion is that we need hashing on both client side and server side. This is also suggested in the discussions in the stackexchange pages you referenced.

Only if you use unique salt on both client and server (and a different one on each), which doesn't sound very practical in haxelib case since it would require one more back-and-forth between server and client for exchanging salt, as initial requests comes from the client.

Failing to do that would make hashing on client side less secure than sending plain password.

tobil4sk commented 2 years ago

I think the conclusion is that we need hashing on both client side and server side.

I really don't see the benefit of hashing on both the client and server. If there is a bad actor who can modify the code running on the server to retrieve the password before it is hashed, then it doesn't matter if what the client sends is a hash or the raw password, either one will give access to the account. They can just modify the client to send the hash directly to log in. Also, if they are someone with that kind of power, they already have access to the server so they can simply just replace an author's libraries on the server without having to log into their account, and thus eventually get whatever code they want running on users' machines when they download the library.

Hashing only on the server side is common practice and there is really little to gain with client side hashing. However, there are a lot of drawbacks.

In my opinion, the best solution is just to follow the standard practice, which will lead to a simpler solution and avoid all the extra problems.

tobil4sk commented 2 years ago

In #564, I implemented the gradual migration to argon2id as I described.

If we were to hash on the client as well (which I personally still oppose), we'll probably need to go for bcrypt instead of argon2. BCrypt has a Haxe implementation so it won't be much of a hassle if we ever decide to migrate the client to a target other than neko (or indeed if someone wants to make their own haxelib client using this repo as a library). However, the C implementation of argon2 is widely used so it is less likely to have security issues than any Haxe algorithm implementations. Having a C dependency is alright for the server but it is a hassle for the client, which is another reason why hashing on the client introduces complexity.

Aurel300 commented 2 years ago

I also see no reason to do client-side hashing. Using one password per platform is what users should do anyway…

andyli commented 2 years ago

Hashing on the client-side is not about protecting the haxelib account, but to protect the users account on other sites. (This is discussed in the stackexchange pages @tobil4sk commented eariler) Of course people shouldn't reuse password, but it's common.

The haxelib client has been doing client-side password hashing since the beginning. Let's not make a change that would improve security in one aspect but remove existing security measures.

kLabz commented 2 years ago

Hashing on the client-side is not about protecting the haxelib account, but to protect the users account on other sites. (This is discussed in the stackexchange pages @tobil4sk commented eariler) Of course people shouldn't reuse password, but it's common.

The haxelib client has been doing client-side password hashing since the beginning. Let's not make a change that would improve security in one aspect but remove existing security measures.

But that makes haxelib very vulnerable (md5 hashes dump from a hacked database anywhere can be directly used on it without even having to crack the passwords)

andyli commented 2 years ago

I'm not against replacing md5 with another algo. I just insist we need client-side hashing (with an additional server-side hashing too).

l0go commented 2 years ago

I am also against client-side hashing, the server should be using TLS/SSL always, which as far as I know (not much) should make MITM attacks not be a concern.

l0go commented 2 years ago

@andyli Can we look into this more? Delaying something important for security shouldn't be happening.

andyli commented 9 months ago

@tobil4sk Thanks for your work, and sorry for blocking this long-awaited improvement. Let me reiterate one last time what my concern is and what I think would satisfy everyone here.

My main concern is that the lack of client-side hashing would give the haxelib server operator (myself, the current hosting provider, and whoever got access to the haxelib server) read access to the passwords. It means haxelib users who reuse their passwords may give out access to their other online accounts. I have yet seen anyone address this point since I mentioned it in this thread.

Here is my final proposal - keep the current client-side md5 hashing and add server-side argon2id (or any modern secure method) hashing. This at least wouldn't be a security downgrade regarding my main concern. We then got better hashing on the server-side, which is what everyone here agreed to be most important. This proposal also has the benefit of no change on the haxelib client, making it a server-only change. i.e. haxelib users no need to upgrade their haxelib clients for this enhancement.

Apprentice-Alchemist commented 9 months ago

If the server operator is evil and sees the md5 hashes, they can still crack the password. Keeping the md5 hash on the client side has zero security benefit. It might even reduce security (for example there's some issues with mixing md5 and bcrypt, though I'm not sure if it also applies to Argon2id)

Client side hashing only makes sense if it uses a proper key derivation function. But this requires shipping the argon2 ndll with the client (or DIY-ing a Haxe implementation, which I would not recommend) and we will run into the same issue as today if we need to switch hash functions again.

As for password reuse: I would really hope that all programmers use a password manager and randomly generated passwords. In my opinion this issue is solved by educating users (and implementing functionality to allow users to change their password if necessary).

Also, an evil server operator can do things like replacing major haxelibs with malware so I think compromised passwords would be the least of our concerns in such a case.

TLDR: Keeping md5 in the client has at best no benefit and at worst is a security issue. Proper client side hashing is technically possible but (in my opinion) does not have enough benefits to justify the complexity involved in deploying it.

andyli commented 9 months ago

Right, I know client-side md5 hashing is a weak protection against evil server operator, but it's still better than no protection. To be extra clear, my bottomline is "do not reduce existing security measures". Removing client-side hashing would reduce protection against evil server operator, and is also a breaking change that require a haxelib client upgrade from the user side.

Keeping md5 in the client has at best no benefit and at worst is a security issue.

I can't see how keeping md5 in the client (when strong server-side hashing is added) is worse.

Simn commented 9 months ago

While I find this evil server operator argument somewhat contrived, I also don't see how the MD5 client-side hashing could be worse. I can see how there's a very strong argument that it has no benefit, but in that case we might as well keep it in order to have less friction.

IMO the decision should hinge entirely on that aspect: If it's harmless then let's keep it, if it has the potential to cause problems then let's not keep it.

Apprentice-Alchemist commented 9 months ago

Having looked into it a little more, it looks like "password shucking" attacks would also work for argon2id+md5 (the usual example given is bcrypt+md5, but the attack doesn't depend on any specifics of bcrypt as far as I can see) So I would say keeping md5 in the mix does have the potential to cause problems.

andyli commented 9 months ago

If I understand correctly, password shucking requires an attacker to target individual users and will only work on users who re-used a password across websites. I think most of you in this thread don't care much about people who re-use passwords.

tobil4sk commented 9 months ago

Here is my final proposal - keep the current client-side md5 hashing and add server-side argon2id

Keeping the (unsalted) Md5 algorithm around (even if combined with a modern hashing algorithm) is a security liability. Rehashing with a better algorithm won't mitigate the flaws with Md5, so the hash produced by argon2id(md5(password)) suffers the same problems as the Md5 hash (e.g. collisions, etc). This is considerably less secure than if Md5 is not involved at all, and it makes the hashes susceptible to hash shucking, as mentioned:

https://www.scottbrady91.com/authentication/beware-of-password-shucking

andyli commented 9 months ago

From the source you referenced (https://www.scottbrady91.com/authentication/beware-of-password-shucking),

Don’t forget that password shucking requires a user to re-use a password across websites (unfortunately, this is highly likely) and for an attacker to target individual users. As a herd immunity, I think rehashing alone will still help.

Simn commented 9 months ago

I now read a bit about password shucking, if I understand correctly it's basically this situation expressed in Haxe code:

import haxe.crypto.BCrypt;

function main() {
    var leakedMd5Hashes = ["48f03653ae8b6d062bb8b7ae177cb391"];
    var leakedBCryptHashes = ["$2b$10$jziC4wqx5Zz6oIRXidy2ceQM8lhQP8jIhJvy/1hOV4xlEQPsNIeq."];
    var leakedSalt = "$2b$10$jziC4wqx5Zz6oIRXidy2ce";

    for (md5 in leakedMd5Hashes) {
        var bc = BCrypt.encode(md5, leakedSalt);
        for (bc2 in leakedBCryptHashes) {
            if (bc2 == bc) {
                trace("FOUND MATCH: " + md5);
                return;
            }
        }
    }
}

Is that accurate?

That quote from Scott Brady "As a herd immunity, I think rehashing alone will still help." is a bit confusing, but I think what he means is that md5 < bcrypt(md5) < bcrypt. So if the only alternative is MD5 hashing, then it's still better to run bcrypt over the MD5 hashes. But ultimately it's even better to not do that and use bcrypt only.

tobil4sk commented 9 months ago

Don’t forget that password shucking requires a user to re-use a password across websites (unfortunately, this is highly likely) and for an attacker to target individual users. As a herd immunity, I think rehashing alone will still help.

This part of the article discusses rehashing for the purpose of hash migration, it doesn't justify keeping Md5 around forever.

andyli commented 9 months ago

This part of the article discusses rehashing for the purpose of hash migration, it doesn't justify keeping Md5 around forever.

Yes, it means it's an immediate improvement without breaking change (haxelib client update). If you want to do it properly, at the cost of having a breaking change, use a stronge algo on the client side as well.

Let me state it again, my bottomline is "do not reduce existing security measures". The existing client-side md5 hashing has been providing (weak, but exist) protection against server operator.

andyli commented 9 months ago

@Simn I believe it's more like:

function main() {
    var victim = "someone"; // haxelib account username
    var leakedMd5Hashes = [
        // ... md5 hashes leaked from other breached sites
    ];

    for (md5 in leakedMd5Hashes) {
        final succeeded = tryHaxelibSubmit(victim, md5);
        if (succeeded) {
            // Attacker gained access to the victim's haxelib account.
            // Other haxelib accounts are still safe.
            return;
        }
    }
}
Simn commented 9 months ago

Ah yeah, that's basically the same thing except that we don't need any of the BCrypt leaks because the server checks this for us.

Hmm, that's actually a quite convincing argument against client-side hashing... But then again I'm changing my opinion on this with every comment at the moment.

tobil4sk commented 9 months ago

password shucking requires an attacker to target individual users and will only work on users who re-used a password across websites.

It will also work for anyone who has a weak or common password, because the hashes for these would be readily available in pre-computed tables.

IMO the decision should hinge entirely on that aspect: If it's harmless then let's keep it, if it has the potential to cause problems then let's not keep it.

The Md5 algorithm being involved at all is actively harmful for security. An attacker trying to crack the hashes has a much easier job trying to crack the rehashed passwords than if we were using purely argon2id (because of password shucking).

The existing client-side md5 hashing has been providing (weak, but exist) protection against server operator.

The server operator can already change the contents of a library (including haxelib itself) and therefore run arbitrary code on the machine of anyone who's installed it. That's enough power to get the password even with proper client side hashing, and to do a lot worse.

Is that accurate?

@Simn That is correct. Then once they work out which md5 hash is used, they can just crack the md5 hash and they have the original password, without having to crack argon2id. So it makes the task as easy as cracking md5 and they have stripped off any protection that would have been provided by argon2id.

Simn commented 9 months ago

It will also work for anyone who has a weak or common password, because the hashes for these would be readily available in pre-computed tables.

I was thinking of that as well. I know it's tempting to fall into a "we don't care about those idiots" line of thinking here, but the reality of the situation is that any such fallout is going to affect us regardless. If someone hacks Nicolas' haxelib account tomorrow because his password is hunter2 then we can't just sit there and say "should have known better".

P.S.: Please don't turn into an evil server operator, Andy. :)

andyli commented 9 months ago

It will also work for anyone who has a weak or common password, because the hashes for these would be readily available in pre-computed tables.

No hashing, client and/or server side, will save the users who use a weak or common password.

andyli commented 9 months ago

The Md5 algorithm being involved at all is actively harmful for security. An attacker trying to crack the hashes has a much easier job trying to crack the rehashed passwords than if we were using purely argon2id (because of password shucking).

The attacker has to obtain the md5 hash first. The md5 hashes in the haxelib DB will be removed once we rehash them.

The server operator can already change the contents of a library (including haxelib itself) and therefore run arbitrary code on the machine of anyone who's installed it. That's enough power to get the password even with proper client side hashing, and to do a lot worse.

What I said earlier:

It means haxelib users who reuse their passwords may give out access to their other online accounts

This is still being prevented (weakly) by the client-side md5 hash.

Apprentice-Alchemist commented 9 months ago

The real question here is not "md5 or not" it's "client-side hashing or not". I hope we can all agree that all use of md5 needs to be removed in the near future. Temporarily layering hashes is an option, but only as part of a short transition period.

Client side hashing has some drawbacks:

The argument in favor is that client side hashing provides protection against an evildoer with access to the haxelib server in the case of password reuse.

I argue that this protection is minimal and does not justify the complications. Access to the haxelib server allows distribution of malware to the devices of end users, which can allow for e.g installing keyloggers. If the server is compromised users are compromised too, no amount of client side hashing is going to solve that.

Putting the threat of malware distribution aside: if you're concerned about (evil) server operators seeing passwords the proper solution would be (for example) the Secure Remote Password protocol or WebAuthn.

The attacker has to obtain the md5 hash first. The md5 hashes in the haxelib DB will be removed once we rehash them.

There are huge amounts of leaked databases online, we should assume any attacker trying this has found relevant hashes already.

This is still being prevented (weakly) by the client-side md5 hash.

Md5 is broken, especially combined with the lack of salting (which means rainbow tables can be used). Combined with the fact that reused passwords are probably weaker passwords, I do not think a motivated attacker would have any issue cracking the hashes.

andyli commented 9 months ago

Temporarily layering hashes is an option, but only as part of a short transition period.

So let's do this now?

Client side hashing has some drawbacks: breaking change every time we need to switch hash functions (or even just bumping up argon2 parameters to keep up with hardware becoming more powerful)

Breaking changes are inevitable in the long term. It's just my proposal can provide an immediate improvement without a breaking change, benefiting everyone including haxelib users of both existing and future clients.

requires shipping extra ndlls, which complicates deployment (is there even a way for haxelibs (such as an updated haxelib client) to ship and use ndlls? Or would this require shipping a new version of Haxe or Neko?)

haxelib, both client and server side should migrate off Neko, which is officially deprecated. I do understand it would require more time and work, that's why I proposed having client-side md5 + server-side whatever as an immediate improvement.

the proper solution would be (for example) the Secure Remote Password protocol or WebAuthn.

I considered those but didn't mentioned since it's an even bigger change. For the record, I do believe elimilating password altogether is the right approach. For many many years, Debian has been doing that (packagers sign packages with public key). OPAM uses a Github repo for storing library info, you don't even need a dedicated "OPAM" account.

There are huge amounts of leaked databases online...

Yes, and many of them are leaked plain-text password, which is useful for generic dictionary attack, which is not preventable with server-side-only hashing. There is not much difference whether client-side md5 is used or not.

Md5 is broken, especially combined with the lack of salting (which means rainbow tables can be used). Combined with the fact that reused passwords are probably weaker passwords, I do not think a motivated attacker would have any issue cracking the hashes.

I know, but it's not a good reason to introduce a breaking change just to remove client-side md5 hashing. We can improve the current situation without a breaking-change. Later, after the immediate improvement, we can pick a even better solution at the cost of a breaking-change.

Apprentice-Alchemist commented 9 months ago

Okay. Let's say we temporarily go with md5+argon.

Since we do not want to keep md5 in the loop forever (right?), how do we then upgrade those passwords when we get rid of md5 on the client?

1) Switch to server side hashing only Server knows that stored password is md5+argon, and since it sees the plaintext password it can trivially authenticate the client and upgrade the stored password to argon-only. (Not much difference with going md5 -> argon directly, just an intermediate phase where passwords are md5+argon)

2) Switch to client side hashing + server side hashing Login protocol needs to be changed so that the client first asks the server for hash parameters. Password upgrade can only be done with explicit user cooperation, otherwise an evil server operator could fool a client with an already upgraded password to send the md5 hash. (Although in that case an evil server operator could attempt a phishing attack to trick the user into resetting their password (or deploying evil clients, given that they control haxelib). That could be mitigated by eventually deploying clients that under no circumstances attempt to do md5, though to properly stop this kind of attack a second breaking change is needed where new clients that don't allow md5 use a new api version and the api used by old clients is disabled.)

Again, client side hashing introduces technical complexity and slows down future algorithm upgrades for little gain. Haxelib is not a password manager, it doesn't do any E2EE, so plaintext passwords being visible to the server operator doesn't matter that much. And if the server is compromised users should consider their password and potentially their devices compromised, no matter how good the client (or server) side hashing.


That said, updating the server to rehash passwords is better than doing nothing so given the lack of consensus here it's probably a good idea to do that as soon as possible. But we do need to reach a consensus at some point and switch to a proper solution that does not involve md5.

Simn commented 9 months ago

That said, updating the server to rehash passwords is better than doing nothing so given the lack of consensus here it's probably a good idea to do that as soon as possible. But we do need to reach a consensus at some point and switch to a proper solution that does not involve md5.

I agree with this. It's a required step anyway and the removal of the client-side MD5 hashing could be done at a later point.

tobil4sk commented 9 months ago

That said, updating the server to rehash passwords is better than doing nothing so given the lack of consensus here it's probably a good idea to do that as soon as possible.

Yes, this is best to do immediately because without any server-side hashing, a database leak would immediately give direct access to all accounts.

I've created a new pull request adapted from the old one: #619. The earthly builds are failing for me though, and it doesn't seem to be exclusive to my branch... (well, apart from the current error actually x) )