Reads returning old document versions when reusing client

cd-slash commented 6 years ago

I have a serverless function calling a Cloudant database via the python-cloudant library, with the connection to the client being made outside the function handler so that it is cached, like so:

httpAdapter = HTTPAdapter(pool_connections=5, pool_maxsize=100)
client = Cloudant(cloudantUsername, cloudantPassword, url=cloudantURL, adapter=httpAdapter, connect=True)

def update(event, context):
    db = client['db_name']
    document_id = event['pathParameters']['id']
    document = db[document_id]
    document['foo'] = 'bar'
    document.save()
    return json.dumps(document)

def get(event, context):
    db = client['db_name']
    document_id = event['pathParameters']['id']
    document = db[document_id]
    return json.dumps(document)

When I do this, I get a very low latency (<100ms) on repeat executions of any kind (create, update, get), because the "connection" is reused, which is the intention. ("connection" in inverted commas since I know cloudant does not have a concept of keeping an open pipe).

If I call /update and increment the document _rev to 2-XXXX then call /get, the function returns document with _rev 2-XXXX - expected behaviour.

If I call /update again, _rev is correctly incremented to 3-XXXX, but when I call /get again, I get document revision 2-XXXX (i.e. not the updated version). When this is happening, the update function is executing in ~90ms, while the get is executing in ~10ms (too fast to actually be calling the database).

I can eliminate this behaviour by either moving the setup of the Cloudant client inside the function handler like so:

httpAdapter = HTTPAdapter(pool_connections=5, pool_maxsize=100)

def update(event, context):
    client = Cloudant(cloudantUsername, cloudantPassword, url=cloudantURL, adapter=httpAdapter, connect=True)
    db = client['db_name']
    document_id = event['pathParameters']['id']
    document = db[document_id]
    document['foo'] = 'bar'
    document.save()
    return json.dumps(document)

def get(event, context):
    client = Cloudant(cloudantUsername, cloudantPassword, url=cloudantURL, adapter=httpAdapter, connect=True)
    db = client['db_name']
    document_id = event['pathParameters']['id']
    document = db[document_id]
    return json.dumps(document)

or I can explicitly call document.fetch() in the get handler:

def get(event, context):
    db = client['db_name']
    document_id = event['pathParameters']['id']
    document = db[document_id]
    document.fetch()
    return json.dumps(document)

In either case, /get always returns the correct revision of the document.

Getting an old version of the document is unexpected since I believed putting the client setup outside the handler would only cache the client between executions, but calls to the database would still be made each time the handler is called. It looks like they are being made when /update is called, presumably since document.save() forces it, but not when /get is called without a forced call with .fetch().

Is putting the client declaration outside the handler somehow caching the database? If so, is there any way to prevent it from doing so? Are context managers required in this scenario?

tomblench commented 6 years ago

@h3nry0 it seems like there is some sort of interaction between your serverless environment and the context manager where we provide __enter__ and __exit__ methods to automatically call fetch and save. (See also our documentation).

Documents are locally cached as a feature, but if they are not being refreshed properly, this may be a bug.

Can you provide us with more information about the serverless environment? Is it AWS Lamdba, Cloud Functions, or similar?

tomblench commented 6 years ago

@h3nry0 actually I see you're not using the context manager, it might be worth trying the context manager style of code which we suggest in our documentation and seeing if the issue persists.

cd-slash commented 6 years ago

Thanks - the context manager does resolve the issue (as does calling .fetch() manually, which seems analogous). I'm deliberately trying to avoid using the context manager since entering and exiting each time adds considerable overhead.

Good to know if the caching of the document when not using the context handler or calling fetch is expected behaviour - if it is then the issue could be closed, but it does seem a little unintuitive.

ricellis commented 6 years ago

As @tomblench said "Documents are locally cached as a feature". Calling save() updates the remote db and then updates the local dict copy of the document with the new _rev. That is the intended behaviour and the behaviour I am able to reproduce locally both with or without a context manager. The db[document_id] will fetch() if the document is not present in the local dict, but will not update the dict if there is already a local copy.

Thus the behaviour I would expect for your scenario (and what I see locally) is:

Call update
1. Remote call to check for DB
2. Add DB to local dict
3. Remote call to fetch document
4. Add document to local dict (rev 1-xxxx)
5. Document content updated
6. Remote call to update document (makes rev 2-xxxx)
7. Local dict _rev updated (now rev 2-xxxx)
Call get
1. Local dict DB is returned
2. Local dict document is returned (rev 2-xxxx)
Call update
1. Local dict DB is returned
2. Local dict document is returned (rev 2-xxxx)
3. Document content updated
4. Remote call to update document (makes rev 3-xxxx)
5. Local dict _rev updated (now rev 3-xxxx)
Call get
1. Local dict DB is returned
2. Local dict document is returned (rev 3-xxxx)

Given that you don't see that behaviour my guess is that your serverless env is re-using the container for some calls and using a new container for others. For example the behaviour you see could be caused by the client/dict combination used to do the first update or first get being re-used for the second get (i.e. with the second update happening in a different container instance and so the local update to _rev 3-xxxx happens in a completely different dict instance). IIUC AWS Lambda, for example, is at liberty to re-use containers. I'm not sure how deep that re-use goes and what level of re-initialization of objects etc happens, but that's an environment thing that we wouldn't debug.

Doing an explicit fetch (or having the context manager perform that automatically) ensures the state of the local dict is "current" in whatever container you're in before you get the document from it. There is currently no way to disable the local caching in python-cloudant.

htowens commented 6 years ago

I agree with this assessment of the root cause - the Update and get actions are being performed in separate Lambda functions, and thus they will not be using the same container. It’s likely that the get function is reusing the client that previously fetched the document, therefore it is returning the cached version instead of calling the database.

In hindsight, I’d say this is rational and expected behaviour - it’s no different than a completely different (i.e. non-Lambda) client having updated the database and should be dealt with in line with eventual consistency principles.

Where I was going wrong was assuming that the get and update actions were happening within the same container because they are both part of the same Lambda service and thus the locally cached document should have been updated - but I now understand this not to be the case, since each function runs in a separate container.

cloudant / python-cloudant

Reads returning old document versions when reusing client #380