dcwatson / django-pgcrypto

Python and Django utilities for encrypted fields using pgcrypto.
BSD 2-Clause "Simplified" License
67 stars 22 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 1: invalid start byte #29

Closed slyapustin closed 4 months ago

slyapustin commented 3 years ago

I have an issue upgrading from the version 1.4.0 to 2.0.0 I can't access the values of the fields stored in the DB with the version of the 1.4.0.

Here is the model:

class Profile(models.Model):
    # ...
    email = pgcrypto.EncryptedEmailField(null=True, blank=True)
    # ...

Here is how I try to access field value:

Profile.objects.get(pk=1).email

Traceback:

Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.7/3.7.9/Frameworks/Python.framework/Versions/3.7/lib/python3.7/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/django/db/models/query.py", line 402, in get
    num = len(clone)
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/django/db/models/query.py", line 256, in __len__
    self._fetch_all()
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/django/db/models/query.py", line 72, in __iter__
    for row in compiler.results_iter(results):
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1084, in apply_converters
    value = converter(value, expression, connection)
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/pgcrypto/fields.py", line 101, in from_db_value
    return self.to_python(value)
  File "/Users/sergey/ENV/proj/lib/python3.7/site-packages/pgcrypto/fields.py", line 97, in to_python
    return unpad(self.decrypt(dearmor(value, verify=self.check_armor)), self.block_size).decode(self.charset)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 1: invalid start byte

I have no issue accessing fields values saved with the django-pgcrypto==2.0.0.

Here is the version of some related packages installed in that environment:

Python 3.7.9

Django                2.2.12
django-pgcrypto       2.0.0
psycopg2-binary       2.7.5
dcwatson commented 3 years ago

I can't say for certain that this isn't a bug, but is it possible your key changed between versions? Data decrypted with an incorrect key will just yield gibberish (there's no authentication) - see https://github.com/dcwatson/django-pgcrypto/blob/master/testapp/tests.py#L153

slyapustin commented 3 years ago

@dcwatson No, I don't think anything has been changed on my side other than django-pgcrypto.

The steps on my side to get UnicodeDecodeError are: 1) pip install django-pgcrypto==2.0.0 2) Profile.objects.get(pk=1).email -> UnicodeDecodeError 3) pip install django-pgcrypto==1.4.0 4) Profile.objects.get(pk=1).email -> No exception

slyapustin commented 3 years ago

@dcwatson Let me know if you need some extra details from me.

dcwatson commented 3 years ago

Sorry I haven't had a chance to try to reproduce this yet. Do you have any example data/keys from a previous version you can share here? The biggest holdup for me is I no longer have Python 2 installed.

slyapustin commented 3 years ago

I don't think this is related to the Python version. I have that issue on the Python 3.7 just by switching pgcrypto versions.

dcwatson commented 3 years ago

I just tried to re-create this and can't. I created a test model using 1.4 and saved some data, then updated to 2.0 and did not have trouble reading it. Are you dealing with international email addresses (i.e. is the data not ASCII) or is it possible it was originally saved with a different charset? How about your key - are you using bytes or a string? If using a string, it's encoded to bytes using charset. If you can either synthesize or share some data that loads correctly with 1.4 and not 2.0, I could try to take a look.