benawad / dogehouse

Taking voice conversations to the moon 🚀
https://dogehouse.tv/
MIT License
9.12k stars 1.48k forks source link

Hash banned usernames instead of storing the plaintext username #1249

Open sumnerevans opened 3 years ago

sumnerevans commented 3 years ago

Is your feature request related to a problem? Please describe.

As mentioned in https://www.youtube.com/watch?v=oa84RnBsieo, storing banned usernames is problematic for GDPR.

This issue is to document one idea that could solve this problem.

EDIT: this is a more permanent solution to the issue raised in #981.

Describe the solution you'd like

See original YouTube comment

Anthony Steiner: GDPR requires you to drop all of the user's personal data. However, you can 1-way hash their email and/or auth provider ID and still be compliant as long as it's not human readable or reversible. When a new account gets created, hash their username and/or auth provider ID to see if it matches any existing ones. You'll also want to normalize the hash (all upper or all lower, and filtering only alpha-numerics to avoid null-character exploits).

Also, see discussion on Discord here: https://discordapp.com/channels/810571477316403233/810571477770174506/822480511891931187

sumner: I know Quora and Reddit are such great sources /s, but: https://www.quora.com/Does-GDPR-allow-sites-to-keep-the-data-of-banned-users and https://www.reddit.com/r/gdpr/comments/7wegg0/gdpr_and_banning_users/ seem to be fairly insightful. Basically I think you can store necessary information to make sure the ban is enforced properly. However, I think you could avoid the entire issue by storing hashes of the banned usernames instead of the usernames themselves. The technique of hashing things to add a layer of anonymity is used in adtech quite a bit (for example, MD5 hashed IP addresses are considered "more private", even though rainbow tables covering the entire IP address space are easily stored).

Note it is important to also salt the hash to add additional anonymity and actually be resilient to rainbow-table attacks. I mentioned the adtech thing just to show that lawyers at adtech firms have managed to get past regulators by just (badly) hashing personally identifiable information (PII), so I assume that if you do a good job at hashing, it will be fine.

Describe alternatives you've considered

An alternative is to use the "Legitimate Interest" clause for this, but I think it's a lot easier to just hash the banned username.

Additional context

IANL

Nautman commented 3 years ago

Yeah this is the best fix that I can think of, so I agree with you. This is the relevant closed issue btw #981. When I wrote my comment there I didn't see that you had published an issue with a potential fix :)

EDIT: I also think that it should be a hashed id, rather than a username. Otherwise, one would be able to change their username and be unbanned.

TheOtterlord commented 3 years ago

I agree, using the auth id is better than username.

An alternative is to use the "Legitimate Interest" clause for this, but I think it's a lot easier to just hash the banned username.

Interesting. I completely forgot about this. I'm not sure if we would need this whether we hash or not, but I'm no legal expert.

connor4312 commented 3 years ago

Just commenting here since I happened to see the "like" on the comment. The YouTube comment is not correct. Hashing is pseudonymization, which is not anonymization and still causes the data to be considered PII.

A specific pitfall is to consider pseudonymised data to be equivalent to anonymised data.

Source: https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf

However, specifically for banned users, retention may be allowed under Article 89. But this is not generalizable like the commenter implied.

skworden commented 3 years ago

There are two other key parts (ligament interest & deletion of related data) that should make this ok with some documentation.

This process should be legitimate interests because it must process pseudonymization data to ban a user. Banning a user protects the site and other users. There is no other less intrusive way to accomplish this vital feature. Most importantly, the site cascades the user's related records leaving only pseudonymization identifier.

However, the site owner needs to track the legitimate interests to show compliance with GDRP. The site must include all legitimate interests in its privacy statement.

Since ben is in the US, I'd not worry about the required GDRP documentation and just implement the feature with the above in mind. A VC likely has all of the paperwork ready to be uploaded or have lawyers that can supply it quickly and legally.