NSLS-II / metadatastore

DEPRECATED: Incorporated into https://github.com/NSLS-II/databroker
Other
2 stars 11 forks source link

CursorNotFound at MDS #257

Closed hhslepicka closed 7 years ago

hhslepicka commented 7 years ago

At LIX, going over the files of a mesh scan (~4000 files) we received a CursorNotFound exception, result of a timeout at the PyMongo call to find. Here is the notebook with the complete error message: http://nbviewer.jupyter.org/gist/hhslepicka/92a84e78534498adf0510a9c4afc5efd

If I do the same but not processing the data all works fine. This is clearly an error caused by a Timeout at the server side of mongo. http://api.mongodb.com/python/1.6/api/pymongo/collection.html#pymongo.collection.Collection.find

My suggestion is to use find(..., timeout=False) and ensure that the cursor is closed at the end.

Attn. @arkilic @danielballan @tacaswell @lyang11973

ghost commented 7 years ago

I've added this a while ago but apparently got lost in one of many rewrites. pymongo claims to be fixing this issue in its latest release but it would be safer to just add it.

hhslepicka commented 7 years ago

@arkilic that would be great if you could implement those fixes at all mongo related apps for us while this pymongo fix is not available. Thanks! 👍

ghost commented 7 years ago

@hhslepicka will do and you get to enjoy it if we both don't get banned or deported 💨

no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the 
server after 10 minutes of inactivity. 
If set to True, the returned cursor will never time out on the server. 
Care should be taken to ensure that cursors with no_cursor_timeout 
turned on are properly closed.

So your request takes more than 10 minutes (I recall @tacaswell and I have decided was enough time and let pymongo cursor manage it automatically). Btw, the docs you are referring belong to pymongo 1.6, you should be getting pymongo 3+ via conda, if not we should fix that. If we don't close cursors, it might eventually cause memory leakage on the server side. It is a bit more complicated than setting no_cursor_timeout=True, we must close the cursor when the cursor is completely consumed as docs suggest.