Closed thismatters closed 6 years ago
Given the nature of data encryption, anything other than isnull
is not query-able. This is a side affect of encrypting the data in the first place, as the output will change even when the input does not.
@georgemarshall is it possible to revisit this use case? Other django crypto libs have support for exact
lookups, such as the following:
https://django-pgcrypto.readthedocs.io/en/latest/#querying
This is particularly pertinent in situations such as REST APIs where filtering on a sensitive field is often a requirement. Can you explain more about why it is not possible (or not preferrable)?
The reason it is not possible is because a good encryption algorithm (for example RSA OAEP) will use some random (smart) padding before encrypting a value/message (for valid reasons): this means if you encrypt the message several times using the same key, you would get different encrypted values (although they all decrypt to the same clear message, magic).
As a result, looking up a database row based on an encrypted field is impossible. For example, if you need to lookup a user by its email address and this email field is encrypted, you would want to first encrypt the clear email address and look up the database for that encrypted value: you wold not find anything because the encrypted value is non-deterministic and would be different every time.
I also store a (salted) hash of the encrypted field (email address field for example) for lookups (because unlike encryption, a hash is a one way function and deterministic) (so this field is stored in a safe unreadable way, however it cannot be "decrypted"/"unhashed"), in addition to the encrypted field as a separate column (so that if I read a list of user rows, I can decrypt the email values and read them).
This solves for both looking up a row/object by an "encrypted" field, and also being able to read encrypted fields for a list of rows/objects selected by other means.
This does not solve for the impossibility to sort and filter by encrypted field: for example it is still impossible to query something like User.objects.filter(email__icontains="@gmail.com")
.
@Bastien-Brd thanks for the in-depth explanation about how IVs affect the encrypted output.
In regards to salted hashing, it would not work for exact
look ups. Since the salt would change with each entry and you would have to rehash the value with each salt. The idea that speaks to me as being the most feasible would be to use iterative hashing, which would recursively hash the value n
of times.
My only concern with a hash table lookup is that it potentially weakens the security of the encrypted value. Although this could be mostly curbed through a large number of iterations and a strong hashing algorithm.
One option to avoid potential weakening the security of the encrypted value could be to not use the (whole) value as source data for the hash. For example, sort all characters of the email to abc order before hashing. You would end up with more collisions but at least you get some small enough set to filter data row-by-row (e.g. abc@example.com and bac@example.com would hash to same value).
I was wondering if we could compromise on security a bit and allow implementation of deterministic encryption algorithm like AES in SIV mode maybe? This will basically produce same encrypted key for the same string. Will this allow the lookups blocked here?
Hi, it seems as though this package only supports the
isnull
lookup. Is there any plan to support additional lookups? I would specifically request aniexact
lookup.