Closed Lagovas closed 2 years ago
Found that valid strings with \n characters for string types we encode as hex values and then it rendered incorrectly in web apps.
Could you provide an example of the full workflow where the problem occurs and for what reason? I just don't quite understand what do you mean
Could you provide an example of the full workflow where the problem occurs and for what reason? I just don't quite understand what do you mean
first time I found it in our rubygems example: There is hex value:
>>> unhexlify(b'2d2d2d205b5d0a')
b'--- []\n'
And that is how it looks like with correct encoding:
Empty value filtered on the app's side.
There is Acra encoded in such way due to \n
symbol which is not printable
Ok, thank you for explanation! So my question is, are there any issues with other control characters? I saw \n
and \t
in tests. But what about for example \r
? Maybe it's better to just check that slice is valid utf8, and if not, use hex encoding? At least this complies with postgres' varchar
type.
Ok, thank you for explanation! So my question is, are there any issues with other control characters? I saw \n and \t in tests. But what about for example \r? Maybe it's better to just check that slice is valid utf8, and if not, use hex encoding? At least this complies with postgres' varchar type.
Now it is checks "SPACE" group of unicode symbols and ignores other control symbols.
And now it checks for utf8 strings, spaces and \\
. Otherwise encodes into the hex.
I am not sure if it work correctly. For example, I've configured two tables: encrypted
and plain
with one text
field. Then I insert data with one backslash. Here are the results:
On the left there is a connection to acra, and on the right direct connection to postgres. As you can see, the Acra encoded plain
results into hex, and encrypted removes \
from decrypted results entirely (checked in wireshark).
Maybe we should revise and somehow document the encoding/decoding strategy?
so, I debugged how postgres sends string and binary values. And yes, if data has text
type then postgres doesn't encode it and sens as is. Even if it is not unprintable control characters but valid utf8.
But if its binary data, then it sends in the escaped format (hex/octal)
Postgres knows types in the database but doesn't know type on application side. For example if app operates with str
types, in postgres it is stored as binary. So anyway it comes to acra-server as binary values. But after decryption we should send forwared it as string values without any encodings, otherwise encode as binary data.
Also I met a problem with encoding binary data when we don't know real type. When we received data that not related to any encryptor_config setting. Acra decrypts acrastructs/acrablocks transparently without any encryptor_config because data can be stored directly from app to database with acrawriter. Or data can be encrypted with AcraTranslator and also saved directly to database. So on Acra side we try all data to decode, match to known signatures, try to decrypt and then return encoded decrypted data or as is. So I added saving value as is
to return in same representation in case if it was already decoded but not decrypted, and acra doesn't have info about type.
Found that valid strings with
\n
characters for string types we encode as hex values and then it rendered incorrectly in web apps. In this PR update our check of printable characters that skipped for encoding. Also found that postgresql sends unicode values in another way then simple latin strings (with encoding into hex). But I didn't find how to fix it correctly, so I created T2531 for futher investigations. And this PR just short fix that should fix obvious and frequent cases.Checklist
[ ] CHANGELOG.md is updated (in case of notable or breaking changes)[ ] Benchmark results are attached (if applicable)are up-to-date (in case of API changes)