Closed cd-slash closed 6 years ago
@h3nry0 it seems like there is some sort of interaction between your serverless environment and the context manager where we provide __enter__
and __exit__
methods to automatically call fetch
and save
. (See also our documentation).
Documents are locally cached as a feature, but if they are not being refreshed properly, this may be a bug.
Can you provide us with more information about the serverless environment? Is it AWS Lamdba, Cloud Functions, or similar?
@h3nry0 actually I see you're not using the context manager, it might be worth trying the context manager style of code which we suggest in our documentation and seeing if the issue persists.
Thanks - the context manager does resolve the issue (as does calling .fetch()
manually, which seems analogous). I'm deliberately trying to avoid using the context manager since entering and exiting each time adds considerable overhead.
Good to know if the caching of the document when not using the context handler or calling fetch is expected behaviour - if it is then the issue could be closed, but it does seem a little unintuitive.
As @tomblench said "Documents are locally cached as a feature".
Calling save()
updates the remote db and then updates the local dict copy of the document with the new _rev
. That is the intended behaviour and the behaviour I am able to reproduce locally both with or without a context manager.
The db[document_id]
will fetch()
if the document is not present in the local dict, but will not update the dict if there is already a local copy.
Thus the behaviour I would expect for your scenario (and what I see locally) is:
_rev
updated (now rev 2-xxxx)_rev updated
(now rev 3-xxxx)Given that you don't see that behaviour my guess is that your serverless env is re-using the container for some calls and using a new container for others. For example the behaviour you see could be caused by the client/dict combination used to do the first update
or first get
being re-used for the second get
(i.e. with the second update
happening in a different container instance and so the local update to _rev 3-xxxx
happens in a completely different dict instance). IIUC AWS Lambda, for example, is at liberty to re-use containers. I'm not sure how deep that re-use goes and what level of re-initialization of objects etc happens, but that's an environment thing that we wouldn't debug.
Doing an explicit fetch (or having the context manager perform that automatically) ensures the state of the local dict is "current" in whatever container you're in before you get the document from it. There is currently no way to disable the local caching in python-cloudant.
I agree with this assessment of the root cause - the Update
and get
actions are being performed in separate Lambda functions, and thus they will not be using the same container. It’s likely that the get
function is reusing the client that previously fetched the document, therefore it is returning the cached version instead of calling the database.
In hindsight, I’d say this is rational and expected behaviour - it’s no different than a completely different (i.e. non-Lambda) client having updated the database and should be dealt with in line with eventual consistency principles.
Where I was going wrong was assuming that the get
and update
actions were happening within the same container because they are both part of the same Lambda service and thus the locally cached document should have been updated - but I now understand this not to be the case, since each function runs in a separate container.
I have a serverless function calling a Cloudant database via the python-cloudant library, with the connection to the client being made outside the function handler so that it is cached, like so:
When I do this, I get a very low latency (<100ms) on repeat executions of any kind (create, update, get), because the "connection" is reused, which is the intention. ("connection" in inverted commas since I know cloudant does not have a concept of keeping an open pipe).
If I call
/update
and increment the document_rev
to2-XXXX
then call/get
, the function returns document with_rev
2-XXXX
- expected behaviour.If I call
/update
again,_rev
is correctly incremented to3-XXXX
, but when I call/get
again, I get document revision2-XXXX
(i.e. not the updated version). When this is happening, the update function is executing in ~90ms, while the get is executing in ~10ms (too fast to actually be calling the database).I can eliminate this behaviour by either moving the setup of the Cloudant client inside the function handler like so:
or I can explicitly call
document.fetch()
in the get handler:In either case,
/get
always returns the correct revision of the document.Getting an old version of the document is unexpected since I believed putting the client setup outside the handler would only cache the client between executions, but calls to the database would still be made each time the handler is called. It looks like they are being made when
/update
is called, presumably sincedocument.save()
forces it, but not when/get
is called without a forced call with.fetch()
.Is putting the client declaration outside the handler somehow caching the database? If so, is there any way to prevent it from doing so? Are context managers required in this scenario?