GoogleCloudPlatform / datastore-ndb-python

Client library for use with Google Cloud Datastore from within the Google App Engine Python runtime.
https://cloud.google.com/appengine/docs/standard/python/ndb/
Apache License 2.0
114 stars 48 forks source link

How to run `OR` subqueries concurrently #297

Open leomao10 opened 5 years ago

leomao10 commented 5 years ago

In our project, we got query like this:

TagItem.query(TagItem.template == template.key, TagItem.owner_type == owner_type).filter(TagItem.owner_uid._IN(owner_uids))

And the performance become worse when we owner_uids become bigger, after some profiling, we found that it is because we make datastore query sequentially for each owner_uid.

Screen Shot 2019-06-26 at 12 56 12 pm

And from the ndb doc, we found that any of the IN operation would translate to OR https://cloud.google.com/appengine/docs/standard/python/ndb/queries#neq_and_in

And I found this code in the current code:

Run the subqueries sequentially; there is no order to keep.

https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/master/ndb/query.py#L1957

It doesn't seem to be the most efficient way to filter with IN operation. Wondering if there is a way to change it to make subqueries concurrently.

leomao10 commented 4 years ago

Sorry to ping you directly @wsh

But it seems this issue get ignored for a while.

Wondering if datastore-ndb-python still not supporting fetching data for or clause concurrently? Or there is a way for me to get around it?