Closed martinheidegger closed 2 years ago
Thanks! This is a good step forward, but I will also eventually change this so that no hashes are included in public data anyway.
Is this algorithm slow enough? We need the email hashing to be immune to rainbow attack, we can accept a very slow computation. Since email are not random they are more vulnerable than passwords.
Feedback from twitter: "in that PR the salt seems to be shared for all emails, which defeats the purpose of using a salt"
Usually indeed for passwords you'd store the salt alongside the hash so the salt is unique per password, isn't it appropriate here as well?
See https://stackoverflow.com/questions/6832445/how-can-bcrypt-have-built-in-salts
This is a weak patch that is definitely not good enough. We had a off-channel conversation and I think it would really be a lot better if the results contain a lookup number.
survey | email (not published, can also be a hash!) | published_id | published_id_without_order (random id associated to email) |
---|---|---|---|
2019 js | a@b.c | 1 (because its the first row in the data) | f7f5768b5d7961b6be78 |
2019 css | d@e.f | 2 | acf07adbb14c28f26831 |
2020 css | a@b.c | 1 | f7f5768b5d7961b6be78 |
this way the outside knows that its the same user (information that is supposed to be given) but not which email address it is.
The random number internally associated is the best approach I think. Not sure if hashing the email address itself in the db is overkill 😅 However, it would also be necessary to sort the results by the random number else it would be revealing the fill-out order, similar to how the increment number would. Sadly, I don't know Vulcan (or this project) well enough to implement this. Probably just associating a value in a user table?
The salt defeats the purpose
The salt is private, known only to you. If you expose it (like you did in the past) then its true that a rainbow table will allow to figure out the email addresses quite quickly. But unless you publish it, this will at least not contain the emails as raw data. So it does have some value. In any case: I don't like the approach so much as well. Its a quick patch.
Thanks for the details.
We should think in the context of the most catastrophic event, the database leaking and secrets leaking. We should be almost immune to this, especially if we want to keep asking sensitive information in countries where they can be collected. The criticity is too high.
Even if we stop asking those questions, there are other problematic scenario, because this data could be very interesting to malicious advertisers that could target users extremely precisely using their email + the response to the study. I've asked Stack Overflow about that as well, I wonder how they handle this internally.
Those are low probability scenarios, but I think I burned our only excuse so we won't have any if anything happens again in the future. A data collection survey is an easy and interesting prey.
I think hashing the email in the lookup table is appropriate, I am also investigating how to hash it in the auth process.
We could keep email listings in a separate collection unrelated to the user collection (+ with a different createdAt than the signup up date) for the only purpose of telling when a new survey is ready.
A quick patch is definitely a great 1st approach, thanks for this!
Not having metor up and running (sorry 😅) I am not really in a position to test this properly, however this PR should replace the current encryption method with a hash of the email addresses. It reuses the previous Encryption Key as a hash salt, though I recommend renaming that property. If no hash/key property is set the code will fail, preventing the accidental start of the server without a salt set.