Azure / azure-cosmos-python

🚨🚨🚨This SDK is now maintained at https://github.com/Azure/azure-sdk-for-python 🚨🚨🚨
https://github.com/Azure/azure-sdk-for-python
MIT License
150 stars 141 forks source link

synchronized_request._Request is_media sniff bug #187

Open Druid-of-Luhn opened 4 years ago

Druid-of-Luhn commented 4 years ago

(see Conclusion for TL;DR)

Package version (pip): azure_cosmos-3.1.2-py3-none-any.whl

Observed error

When trying to upsert a document to a collection:

document = { 'id': str(datetime.datetime.utcnow()), 'key': 'example' }
collection_link = 'dbs/campaigns/colls/digitalmedia' # spoiler: note this collection link
options = { 'partitionKey': document['key'] }
client.UpsertItem(collection_link, document, options)

I get an error thrown from inside azure.cosmos.cosmos_client at line 2880:

File "...\env-3.5.4\lib\site-packages\azure\cosmos\cosmos_client.py", line 2880, in _AddPartitionKey
    partitionKeyDefinition = collection.get('partitionKey')
AttributeError: 'str' object has no attribute 'get'

This error had not been encountered in previous uses of the library.

Investigation

Upon printing the value of collection just above that line, I saw that it was a JSON string, instead of a dict, as the code must have expected.

So I followed the calls backwards: each time, the raw return value of the callee was returned:

At last, in the final synchronized_request._Request function, I see at the end:

if is_media:
    result = data
else:
    if len(data) > 0:
        try:
            result = json.loads(data)
        except:
            raise errors.JSONParseFailure(data)

return (result, headers)

So this tells me that if is_media is set, the response is not parsed from JSON, but instead returned as-is. Injecting a little print(is_media) in there told me that the flag was indeed set, so I went up to see how it was set:

def _Request(global_endpoint_manager, request, connection_policy, requests_session, path, request_options, request_body):
    """Makes one http request using the requests module.
    ...
    :return:
        tuple of (result, headers)
    :rtype:
        tuple of (dict, dict)

    """
    is_media = request_options['path'].find('media') > -1

(note that the return type is said to be (dict, dict), but we now know that to not always be the case)

So is_media is sniffed from request_options['path']. Intrigued, I print(request_options['path']) only to be greeted by a familiar sight:

/dbs/campaigns/colls/digitalmedia/

Conclusion

If the database id, collection id or document id (I assume) contains the substring media, then the result of the API GET request in synchronized_request._Request will not be parsed as JSON, and the azure-cosmos library will fail further down the line when expecting the returned value to be a dict rather than a str.

The obvious workaround at this point would be to just use a container with a different name. I can do this during the testing phase, but we already expect it to be called what it is, so just changing the name is not really viable. Is there a safer way of determining whether a request is_media?

The second workaround is for us to fork this library, remove the is_media check and use our fork instead (whether by putting it locally in the project or hosting it somewhere for pip to pick up). This is more effort than is desirable, but may be necessary to not be affected by this bug.

Druid-of-Luhn commented 4 years ago

It has been almost 2 weeks since I raised this issue. A response, even if it is just an acknowledgement, would be appreciated.