Closed bmclean closed 9 years ago
@bmclean Thanks for reporting this.
Are you setting limit_default
? It seems strange that this would be occurring. Also it appears you have requests coming in with limit
set from the query.
Can you post some sample code and a sample query / payload?
Yes, we have limit_default at 100. Here is an example of our call:
@Transaction.query_method(
query_fields=('accountId', 'belowVersion', 'limit',),
path='TransactionGetOlder/{accountId}/{belowVersion}/{limit}',
user_required=True, name='getOlderTransactions',
limit_default=100, limit_max=100,
produce_cursors=True)
def TransactionGetOlder(self, query):
return query.order(-Transaction.entityVersion)
(@dhermes Updated to eliminate overflow)
Here is the query:
GET /_ah/api/libraAPI/v1/TransactionGetOlder/E8F5E3BD-808B-11E4-9CA3-80E6500B2C9A/0/100
Here is an example payload: https://gist.github.com/bmclean/d762d1022dd988469f68#file-payload
@bmclean Thanks for this. I was curious if you were using a reverse query. By using query.order(-Transaction.entityVersion)
you are making it so that the "first 100" are really the "last 100".
I'm not 100% about the reason for the large memory use. ndb
uses 2 levels of caching, in-memory and memcache (in addition to persisting in the datastore). I think (and am checking now) that asking for the beginning of a reversed query is what is causing the issue.
For large datasets, using paging with limits is really not possible. Instead, cursors are the way to enable paging.
OK I tested (locally) the cache and I still don't understand why 5MB is used.
from google.appengine.ext import ndb
class A(ndb.Model):
a = ndb.StringProperty()
# 100+ entities already stored for the A model
ctx = ndb.get_context()
print 'Cache 1:', ctx._cache
A(a='zap').put()
print 'Cache 2:', ctx._cache
results, cursor, more = A.query().fetch_page(2)
print 'Cache 3:', ctx._cache
results, cursor, more = A.query().order(-A.a).fetch_page(2)
print 'Cache 4:', ctx._cache
So the above is printing the in-memory cache after a few operations.
The results are as follows (I edited the datastore IDs for viewing):
Cache 1: {}
Cache 2: {Key('A', 1): A(key=Key('A', 1), a=u'zap')}
Cache 3: {Key('A', 1): A(key=Key('A', 1), a=u'zap'),
Key('A', 2): A(key=Key('A', 2), a=u'46'),
Key('A', 3): A(key=Key('A', 3), a=u'40')}
Cache 4: {Key('A', 1): A(key=Key('A', 1), a=u'zap'),
Key('A', 2): A(key=Key('A', 2), a=u'46'),
Key('A', 4): A(key=Key('A', 4), a=u'zap'),
Key('A', 5): A(key=Key('A', 5), a=u'zap'),
Key('A', 3): A(key=Key('A', 3), a=u'40')}
It took me awhile to find, but here is a reference that might help: http://stackoverflow.com/a/3566878/1068170
The fact that endpoints-proto-datastore
doesn't use offset
makes it a little different though.
We just tried removing the reverse query and it didn't make a difference. One other thing we did try was to turn off both the in-memory cache and the memcache in our Transaction class:
_use_cache = False
_use_memcache = False
and we still see the memory jump up with each query.
@bmclean Thanks for digging. If you do the query outside of endpoints-proto-datastore
do you have the same issues? I'm still unclear if the issue is
endpoints-proto-datastore
directlyendpoints-proto-datastore
calling ndb
incorrectlyndb
doing something wrong / unexpectedI executed the following 20 times using the remote_api_stub:
Transaction.query(ancestor=ndb.Key('Account', account_id)).fetch_page(100, start_cursor=None)
The memory of the python process started at 104.8 MB and ended at 108.1 MB.
When I make the request 20 times using the Google API explorer the memory of the python process started at 106.5 MB and ended at 199.8 MB.
Wow, thanks for this! I am totally fine to accept your fix.
Do you want to send a pull request?
# Works just fine
q = Transaction.query(ancestor=ndb.Key('Account', account_id))
result, next_cursor, more = q.fetch_page(100, start_cursor=None)
# Happens in endpoints-proto-datastore
# query_info.cursor is none if not set from the request
items, next_cursor, more_results = query.fetch_page(
request_limit, start_cursor=query_info.cursor, projection=['field1, 'field2'])
a possible culprit may be
query_info = request_entity._endpoints_query_info
query_info.SetQuery()
since these _EndpointsQueryInfo
objects will also hold a (potentially large) entity. It's unclear to me why not having produce_cursors
set would allow them to be GC'ed?
Can you add
import gc
potential = [obj for obj in gc.get_objects()
if 'endpoints_proto_datastore' in type(obj).__module__]
into your code to see where the 5MB bump comes from in remote_api_stub
.
Some more potentially helpful links: https://code.google.com/p/googleappengine/issues/detail?id=9610 http://stackoverflow.com/questions/12095259/ndb-not-clearing-memory-during-a-long-request Memory profiler: http://stackoverflow.com/a/110826/1068170
We realized that running the query from the remote_api_stub really isn't the same thing. So we added the test query to one of our webapp2 controllers:
q = Transaction.query(ancestor=ndb.Key('Account', account_id))
result, next_cursor, more = q.fetch_page(100, start_cursor=None)
and refreshed the page 20 times. This resulted in the memory climbing to 185 MB. However, if we change the query to a fetch (instead of fetch_page):
q = Transaction.query(ancestor=ndb.Key('Account', account_id))
result = q.fetch(100)
the memory doesn't climb at all.
I read through all of https://code.google.com/p/googleappengine/issues/detail?id=9610 and I'm pretty sure this is what we have encountered.
ISTM the bug is clearly in ndb
. However, I am unclear of how to address this within endpoints-proto-datastore
. Your fix is nice, but will remove the ability for paging. I suppose for sophisticated users such as yourself, patching the library and using db
instead of ndb
in query_method
will give you a way to eliminate the memory leak but also enable paging.
We are defaulting produce_cursors to True in model.py so the default paging behaviour is unchanged. We then set produce_cursors=False in our API method decorators to explicitly disable it.
Obviously this is a workaround for an ndb bug, not an issue in endpoints-proto-datastore. Based on this behaviour, we're going to avoid fetch_page calls with cursors in our web application as well.
How do you mean "defaulting produce_cursors to True in model.py"? Like as a module globule or defined on the classes themselves?
Also, if you don't call fetch_page
, how do you get a cursor back from a query?
We added an argument to the query decorator called produce_cursors which has a default value of True.
Line 1435 of model.py:
def query_method(cls,
query_fields=(),
collection_fields=None,
limit_default=QUERY_LIMIT_DEFAULT,
limit_max=QUERY_LIMIT_MAX,
user_required=False,
use_projection=False,
produce_cursors=True,
**kwargs):
We then use the decorator like this:
@Transaction.query_method(
query_fields=('accountId', 'belowVersion', 'limit',),
path='TransactionGetOlder/{accountId}/{belowVersion}/{limit}',
user_required=True, name='getOlderTransactions',
limit_default=100, limit_max=100,
produce_cursors=False)
def TransactionGetOlder(self, query):
return query.order(-Transaction.entityVersion)
We don't use a cursor for this call. We sort by an attribute called entityVersion which we pass in to the request and then use an inequality filter to return a subset of the results. This is likely not as efficient as a cursor, but works well.
I see.
If you think our change is useful to the project or you want to see the changes, I can generate a pull request. Just let me know. We certainly appreciate all of the help today @dhermes!
@bmclean I think I understand your change, I just don't understand why it fixes the ndb
bug, so let's hold off.
I'm closing out.
/cc @pcostell @Alfus https://code.google.com/p/googleappengine/issues/detail?id=9610 is still out in the wild.
@josemontesdeoca FYI
This is another report and gives some sample code for the memory leak in https://code.google.com/p/googleappengine/issues/detail?id=9610
In particular, the peaks 185 MB / 199.8 MB of RAM stuck out in my memory after your comment about a peak of 300 MB.
@dhermes Thanks for the notice! I'll definetly take a closer look
This issue is regarding the query_method decorator and large data sets (> 1000 items).
We are experiencing high memory usage when performing queries on large data sets and limiting the results. The memory seems to be allocated for the entire set, not just the results returned. For example, a query for the first 100 entities from a dataset of 2000 will cause an extra 5 MB of RAM to be used on each request and this is never freed. However, if the last 100 entities of the 2000 are requested, no additional memory is used. If we set produce_cursors to False in the query options used in fetch_page, no nextPageToken is returned, and the behaviour is identical to requesting the last 100 entities (no additional memory used). As a fix, we've added the following option to the query_method decorator in model.py (Nov 26, 2014, commit ed91623ae07b9ff2341d420e48ed940f40552828):
Line 1435:
Line 1600:
We then call the query_method with produce_cursors=False. Is there a better solution? Or could this fix be considered for implementation?