GoogleCloudPlatform / appengine-python-standard

Google App Engine services SDK for Python 3
Apache License 2.0
70 stars 47 forks source link

ndb.PickleProperty UnicodeDecodeError: 'ascii' codec can't decode byte with NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL #111

Open kaansoral opened 9 months ago

kaansoral commented 9 months ago

Expected Behavior

No decode issues

Actual Behavior

Decode issues

Steps to Reproduce the Problem

  1. set NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL:True in app.yaml
  2. Have a ndb.PickleProperty in python27
  3. Access the ndb.PickleProperty in python312
  4. File "/layers/google.python.pip/pip/lib/python3.12/site-packages/google/appengine/ext/ndb/model.py", line 1913, in _from_base_type UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 1: ordinal not in range(128)

Specifications

pnico commented 9 months ago

Cross-compatible pickle protocol is for a different problem from the one you are talking about. You probably need to wrap the property access in a getter function which will catch the error, then try to dig out the underlying stored value, and unpickle it 'manually' using a different protocol like 'latin1' - it depends on what you're storing in the property, which you didn't specify.

kaansoral commented 9 months ago

"NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL" seems pretty self explanatory to me - it should work out of the box. I have no idea what problem it's intended to solve other than the name as it's not documented anywhere.

Still it's a good suggestion, better than waiting for a star to fall, but I could not find an easy way to access the base value at a quick glance without triggering the unpickle - maybe turning ndb.PickleProperty()'s to ndb.BlobProperty()'s and then unpickleing them with 'latin1' and re-pickleing them with Python3 default pickle is an option - yet it's quite a challenging undertaking as I'd need to manually block all access to my app and risk corrupting data

kaansoral commented 9 months ago

My Python is rusty but maybe dynamically overriding the section here in code: https://chromium.googlesource.com/external/googleappengine/python/+/3b4e135a03ddf2132df0c0c8d3f7cdc9eaa8c34b/google/appengine/ext/ndb/model.py

class PickleProperty(BlobProperty):
--
class PickleProperty(BlobProperty):
  """A Property whose value is any picklable Python object."""
  def _to_base_type(self, value):
    return pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
  def _from_base_type(self, value):
    return pickle.loads(value)

with

class PickleProperty(BlobProperty):
--
class PickleProperty(BlobProperty):
  """A Property whose value is any picklable Python object."""
  def _to_base_type(self, value):
    return pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
  def _from_base_type(self, value):
    try: return pickle.loads(value)
    except: return pickle.loads(value,encoding='latin1')

is an easy fix

Is it unreasonable to expect App Engine team to do this themselves when NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL flag is set?

kaansoral commented 9 months ago

This ended up being the patch that was refused here https://issuetracker.google.com/issues/322622693

def _from_base_type(self, value):
    try:
        return pickle.loads(value)
    except:
        return pickle.loads(value,encoding='latin1')
ndb.model.PickleProperty._from_base_type=_from_base_type
pnico commented 9 months ago

ah, yeah we're doing something similar just on a case-by-case basis where we know we have datetime objects in the pickle property.. it seems like the "except" should actually specify UnicodeDecode error there though ^

About what NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL (and friends) are for, this is all a bit confusing so I could be wrong about this, but my understanding was that it was so you could store a pickled object in a Python 3 instance and still have it be readable by a still-running Python 2 instance, because it will be pickle protocol 2 which Python 2 can still parse (it can't parse the newer, default pickle protocol used by Python 3). Which if I'm correct, this probably matters less now that Google won't let you deploy anything with Python 2 any more.

But a UnicodeDecodeError in Python 3 will still happen with datetime objects pickled in Python 2 in any protocol version without using the 'latin1' encoding, but that's not a bug with this library, it's just a "fact of life" when unpickling Python2-pickled objects in Python 3 (I think :D )