georgemarshall / django-cryptography

Easily encrypt data in Django
https://django-cryptography.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
379 stars 75 forks source link

Supported lookups #10

Closed thismatters closed 6 years ago

thismatters commented 6 years ago

Hi, it seems as though this package only supports the isnull lookup. Is there any plan to support additional lookups? I would specifically request an iexact lookup.

georgemarshall commented 6 years ago

Given the nature of data encryption, anything other than isnull is not query-able. This is a side affect of encrypting the data in the first place, as the output will change even when the input does not.

mrname commented 5 years ago

@georgemarshall is it possible to revisit this use case? Other django crypto libs have support for exact lookups, such as the following:

https://django-pgcrypto.readthedocs.io/en/latest/#querying

This is particularly pertinent in situations such as REST APIs where filtering on a sensitive field is often a requirement. Can you explain more about why it is not possible (or not preferrable)?

Bastien-Brd commented 4 years ago

The reason it is not possible to lookup/filter by encrypted field

The reason it is not possible is because a good encryption algorithm (for example RSA OAEP) will use some random (smart) padding before encrypting a value/message (for valid reasons): this means if you encrypt the message several times using the same key, you would get different encrypted values (although they all decrypt to the same clear message, magic).

As a result, looking up a database row based on an encrypted field is impossible. For example, if you need to lookup a user by its email address and this email field is encrypted, you would want to first encrypt the clear email address and look up the database for that encrypted value: you wold not find anything because the encrypted value is non-deterministic and would be different every time.

My solution

I also store a (salted) hash of the encrypted field (email address field for example) for lookups (because unlike encryption, a hash is a one way function and deterministic) (so this field is stored in a safe unreadable way, however it cannot be "decrypted"/"unhashed"), in addition to the encrypted field as a separate column (so that if I read a list of user rows, I can decrypt the email values and read them).

This solves for both looking up a row/object by an "encrypted" field, and also being able to read encrypted fields for a list of rows/objects selected by other means.

This does not solve for the impossibility to sort and filter by encrypted field: for example it is still impossible to query something like User.objects.filter(email__icontains="@gmail.com").

georgemarshall commented 4 years ago

@Bastien-Brd thanks for the in-depth explanation about how IVs affect the encrypted output.

In regards to salted hashing, it would not work for exact look ups. Since the salt would change with each entry and you would have to rehash the value with each salt. The idea that speaks to me as being the most feasible would be to use iterative hashing, which would recursively hash the value n of times.

My only concern with a hash table lookup is that it potentially weakens the security of the encrypted value. Although this could be mostly curbed through a large number of iterations and a strong hashing algorithm.

kajala commented 4 years ago

One option to avoid potential weakening the security of the encrypted value could be to not use the (whole) value as source data for the hash. For example, sort all characters of the email to abc order before hashing. You would end up with more collisions but at least you get some small enough set to filter data row-by-row (e.g. abc@example.com and bac@example.com would hash to same value).

bkhanale commented 4 years ago

I was wondering if we could compromise on security a bit and allow implementation of deterministic encryption algorithm like AES in SIV mode maybe? This will basically produce same encrypted key for the same string. Will this allow the lookups blocked here?