GoogleCloudPlatform / datastore-ndb-python

Client library for use with Google Cloud Datastore from within the Google App Engine Python runtime.
https://cloud.google.com/appengine/docs/standard/python/ndb/
Apache License 2.0
114 stars 48 forks source link

`query.count()` is not always equivalent to `len(query.fetch())` #271

Open mgilson opened 8 years ago

mgilson commented 8 years ago

The documentation states that query.count() should be equivalent to (but more efficient than) len(query.fetch()). I believe that I have found a corner case where this is not true.

For my application, I have a query with the following filter structure (obtained while running unit-tests with the gae testbed):

OR(
  AND(
    FilterNode('X.y', '=', True),
    FilterNode('X.a', '=', datastore_types.Key.from_path(u'KeyType1', u'123', _app=u'testbed-test')),
    PostFilterNode(<google.appengine.ext.ndb.query.RepeatedStructuredPropertyPredicate object at 0x10bafb590>),
    FilterNode('foo', '=', True)),
  AND(
    FilterNode('X.y', '=', True),
    FilterNode('X.a', '=', datastore_types.Key.from_path(u'KeyType2', 1L, _app=u'testbed-test')),
    PostFilterNode(<google.appengine.ext.ndb.query.RepeatedStructuredPropertyPredicate object at 0x10bafb950>),
    FilterNode('foo', '=', True)))

I get different results when trying to count the results vs. trying to fetch them. FWIW, I originally really only cared if there was one or more items, so I've switched to code to .get() the result versus .count. With query.get(), my unit-test passes once again, however, with query.get(keys_only=True) my unit-tests begin to fail again. I've tried to reconstruct this in a simpler unit-test but I haven't been able to do so easily. e.g. the following query filter structure seems to work OK:

OR(
  AND(
    FilterNode('prop1.name', '=', 'foo'),
    FilterNode('prop3', '=', 'baz')), 
  AND(
    FilterNode('prop2.name', '=', 'foo'),
    FilterNode('prop3', '=', 'baz')))

This leads me to believe that the problem is in the PostFilterNode. Thinking about it, it would make sense that a keys-only query would fail if there needs to be in-memory filtering. I would expect that a sensible exception should be raised in this case -- e.g. 'The query that you've requested does not support "keys_only"' or something to that effect.