cossacklabs / acra

Database security suite. Database proxy with field-level encryption, search through encrypted data, SQL injections prevention, intrusion detection, honeypots. Supports client-side and proxy-side ("transparent") encryption. SQL, NoSQL.
https://www.cossacklabs.com/acra/
Apache License 2.0
1.35k stars 128 forks source link

Don't escape space characters in strings #525

Closed Lagovas closed 2 years ago

Lagovas commented 2 years ago

Found that valid strings with \n characters for string types we encode as hex values and then it rendered incorrectly in web apps. In this PR update our check of printable characters that skipped for encoding. Also found that postgresql sends unicode values in another way then simple latin strings (with encoding into hex). But I didn't find how to fix it correctly, so I created T2531 for futher investigations. And this PR just short fix that should fix obvious and frequent cases.

Checklist

G1gg1L3s commented 2 years ago

Found that valid strings with \n characters for string types we encode as hex values and then it rendered incorrectly in web apps.

Could you provide an example of the full workflow where the problem occurs and for what reason? I just don't quite understand what do you mean

Lagovas commented 2 years ago

Could you provide an example of the full workflow where the problem occurs and for what reason? I just don't quite understand what do you mean

first time I found it in our rubygems example: image There is hex value:

>>> unhexlify(b'2d2d2d205b5d0a')
b'--- []\n'

And that is how it looks like with correct encoding: image Empty value filtered on the app's side. There is Acra encoded in such way due to \n symbol which is not printable

G1gg1L3s commented 2 years ago

Ok, thank you for explanation! So my question is, are there any issues with other control characters? I saw \n and \t in tests. But what about for example \r? Maybe it's better to just check that slice is valid utf8, and if not, use hex encoding? At least this complies with postgres' varchar type.

Lagovas commented 2 years ago

Ok, thank you for explanation! So my question is, are there any issues with other control characters? I saw \n and \t in tests. But what about for example \r? Maybe it's better to just check that slice is valid utf8, and if not, use hex encoding? At least this complies with postgres' varchar type.

Now it is checks "SPACE" group of unicode symbols and ignores other control symbols.

And now it checks for utf8 strings, spaces and \\. Otherwise encodes into the hex.

G1gg1L3s commented 2 years ago

I am not sure if it work correctly. For example, I've configured two tables: encrypted and plain with one text field. Then I insert data with one backslash. Here are the results: image

On the left there is a connection to acra, and on the right direct connection to postgres. As you can see, the Acra encoded plain results into hex, and encrypted removes \ from decrypted results entirely (checked in wireshark).

Maybe we should revise and somehow document the encoding/decoding strategy?

Lagovas commented 2 years ago

so, I debugged how postgres sends string and binary values. And yes, if data has text type then postgres doesn't encode it and sens as is. Even if it is not unprintable control characters but valid utf8. But if its binary data, then it sends in the escaped format (hex/octal)

Postgres knows types in the database but doesn't know type on application side. For example if app operates with str types, in postgres it is stored as binary. So anyway it comes to acra-server as binary values. But after decryption we should send forwared it as string values without any encodings, otherwise encode as binary data.

Also I met a problem with encoding binary data when we don't know real type. When we received data that not related to any encryptor_config setting. Acra decrypts acrastructs/acrablocks transparently without any encryptor_config because data can be stored directly from app to database with acrawriter. Or data can be encrypted with AcraTranslator and also saved directly to database. So on Acra side we try all data to decode, match to known signatures, try to decrypt and then return encoded decrypted data or as is. So I added saving value as is to return in same representation in case if it was already decoded but not decrypted, and acra doesn't have info about type.