GoogleCloudPlatform / datastore-ndb-python

Client library for use with Google Cloud Datastore from within the Google App Engine Python runtime.
https://cloud.google.com/appengine/docs/standard/python/ndb/
Apache License 2.0
114 stars 48 forks source link

order of shallow validation and validator #268

Open dmfs opened 8 years ago

dmfs commented 8 years ago

In model.py, line 1092 _call_shallow_validation is called before any _validator that might exists.

Wouldn't it be better to perform the shallow validation after a custom validator? The validator might be able to "fix" any invalid values (that's what it's meant for, isn't it)?

My use case is calling populate with members that contain ids of Keys to update KeyProperties.

i.e. my property looks like this:

  foo = ndb.KeyProperty('Foo',
    validator=lambda prop, value: value if isinstance(value, ndb.Key) else ndb.Key('Foo', value))

That's particularly useful to update a KeyProperty using populate like so:

bar.populate({"foo": "someid"})

Which in turn is useful to update a KeyProperty by specifying the id in JSON like so:

bar.populate(json.loads(jsonstring))

Another use case would be to allow a DateTimeProperty to take an ISO8601 date-time string by converting it to a datetime in the validator (I believe that's not possible right now). That again is very useful when loading JSON data.

dmfs commented 8 years ago

To workaround this limitation, I've created a custom ValidationMixin that calls the validator first:

class ValidationMixin(object):
  # make sure to call _validator before we do as the very first validation step
  def _do_validate(self, value):
    if self._validator is not None:
      newvalue = self._validator(self, value)
      if newvalue is not None:
        value = newvalue
    return super(ValidationMixin, self)._do_validate(value)

# A KeyProperty that allows a validator to generate a Key.
# In addition it serializes to just the id of the key
class KeyProperty(ValidationMixin, ndb.KeyProperty):
  # return just the id of the key
  def _get_for_dict(self, entity):
    value = self._get_value(entity)
    if self._repeated:
      return [v.id() for v in value]
    elif value is not None:
      return value.id()
    return value

With a validator like this

def key_validator(kind):
  def validator(prop, value):
    if not isinstance(value, ndb.Key):
      return ndb.Key(kind, value)
    return value
  return validator

I can have a property like this:

  foo = KeyProperty('Foo', validator=key_validator('Foo'))

Which makes it easy to serialize and deserialize KeyPropertties from/to JSON.

dmfs commented 8 years ago

Hmm, that doesn't seem to work with repeated=True. I'm getting the following stack trace, when I call put() after adding a list of keys like so (on a KeyProperty as above with repeated=True):

bar.populate(json.loads('{"foo": ["486944fe896a44c689275e6f19e3084a"]}'))
bar.put()
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3451, in _put
    return self._put_async(**ctx_options).get_result()
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
    self.check_success()
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
    value = gen.throw(exc.__class__, exc, tb)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 824, in put
    key = yield self._put_batcher.add(entity, options)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 430, in _help_tasklet_along
    value = gen.send(val)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 358, in _put_tasklet
    keys = yield self._conn.async_put(options, datastore_entities)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1852, in async_put
    pbs = [entity_to_pb(entity) for entity in entities]
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 697, in entity_to_pb
    pb = ent._to_pb()
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3167, in _to_pb
    prop._serialize(self, pb, projection=self._projection)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1422, in _serialize
    values = self._get_base_value_unwrapped_as_list(entity)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1192, in _get_base_value_unwrapped_as_list
    wrapped = self._get_base_value(entity)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1180, in _get_base_value
    return self._apply_to_values(entity, self._opt_call_to_base_type)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1355, in _apply_to_values
    newvalue = function(value)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1234, in _opt_call_to_base_type
    value = _BaseValue(self._call_to_base_type(value))
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1255, in _call_to_base_type
    return call(value)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 1331, in call
    newvalue = method(self, value)
  File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 2013, in _validate
    raise datastore_errors.BadValueError('Expected Key, got %r' % (value,))
BadValueError: Expected Key, got [Key('Foo', '486944fe896a44c689275e6f19e3084a')]

Am I missing something? It works well for non-repeated properties.

dmfs commented 8 years ago

I just discovered that it works when I remove anther non-repeated KeyProperty from the model. It looks like the serialization is broken and the wrong KeyProperty instance is passed to the _seralize method.

dmfs commented 8 years ago

Ok, found it. KeyProperty has this really weird constructor "signature magic" (model.py, line 1963).

The point is that if the first parameter is a string, it becomes the field name of the property, not the kind! If you want to specify the kind by string you must use a keyword argument, otherwise the kind parameter must be the actual type not just the name. Correct me if I'm wrong but that's not part of the public documentation, is it?. That's really confusing, because with a ndb.Key you actually can specify the kind as a string as the first positional parameter.

As it happened I had 3 KeyProperties with the same kind, but different attribute names. However, since I specified the kind as a string, it actually became the name. So all three properties were using the same name. As a result, the repeated property value was serialized with the non-repeated KeyProperty instance, causing this crash.

The solution was to specify the kind with a keyword argument:

foo = ndb.KeyProperty(kind='Foo', validator=key_validator('Foo'))

Serializing the KeyProperties from/to JSON works well now. So please consider to change the order of how values are validated.