cloudant / python-cloudant

A Python library for Cloudant and CouchDB
Apache License 2.0
163 stars 55 forks source link

db.all_docs() with keys returns HTTP 415 "Unsupported media type with url" #177

Closed bradwbonn closed 8 years ago

bradwbonn commented 8 years ago

I can POST a list of keys as JSON data to the requests library, but on the same DB using the Cloudant python library, the call fails with an unsupported media type error. Is it not specifying the content-type properly?

Below is the code. old_method() uses requests, new_method() uses python-cloudant.

        def old_method():
            # Uses requests library
            myurl = 'https://{0}.cloudant.com/{1}/_all_docs?include_docs=true'.format(
                self.config['cloudant_account'],
                self.scandb.metadata()['db_name']
            )
            my_header = {'Content-Type': 'application/json'}
            try:
                r = requests.post(
                    myurl,
                    headers = my_header,
                    auth = (config['cloudant_user'],config['cloudant_auth']),
                    data = json.dumps({ 'keys': self.file_doc_batch.keys() })
                )
                result = r.json()
            except Exception as e:
                logging.fatal("Unable to execute HTTP POST: {0}".format(e))
                sys.exit("Unable to execute HTTP POST: {0}".format(e))
            return result

        def new_method():
            # Uses Cloudant python library
            result = self.scandb.all_docs(
                include_docs = True,
                keys = self.file_doc_batch.keys()
            )
            return result

old_method() returns JSON from the _all_docs endpoint with the expected 'rows' array for each key included.
new_method() errors with HTTP 415:

  File "./dirscan.py", line 758, in new_method
    keys = self.file_doc_batch.keys()
  File "/Library/Python/2.7/site-packages/cloudant/database.py", line 372, in all_docs
    resp.raise_for_status()
  File "/Library/Python/2.7/site-packages/requests/models.py", line 844, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 415 Client Error: Unsupported Media Type for url: https://bradwbonn.cloudant.com/scandb-1464114444/_all_docs?include_docs=true
alfinkel commented 8 years ago

This probably has to do with the fact that we are missing the "encoder" piece here. This bug can probably be lumped under the same umbrella as #170.

fwiw, I can do a db.all_docs(include_docs=True, keys=['foo', 'bar', 'baz']) and it works fine so I am guessing it has something to do with your keys list?? Not an excuse, but more of a statement of "it sort of" works but clearly there is a bug that needs to be fleshed out and resolved. Like I said I think it may have to do with the lack of encoding in the json.dumps() of the keys parameter.

bradwbonn commented 8 years ago

Sounds about right. Would it help if I shared my "keys" array? It's 2000 entries. (The recommended object size limit for bulk transactions against Cloudant.)

alfinkel commented 8 years ago

Yeah that would help a lot actually that way whoever is tasked with resolving will have the right list to go against.

bradwbonn commented 8 years ago

https://gist.github.com/bradwbonn/cf9384617f61cc3851c18bfd323d75cb

alfinkel commented 8 years ago

Is file_doc_batch content being pulled from a file and into a dictionary?

bradwbonn commented 8 years ago

file_doc_batch is a dict() built from iterating through a local filesystem and storing metadata about each file. The key is a custom _id field built off of that metadata, and the value is the metadata itself.

bradwbonn commented 8 years ago

The idea being that the app can check the contents of the local filesystem against the database directly by an ID on the primary index.

alfinkel commented 8 years ago

This may be a true Heisenbug type situation. I just used your full list of of keys against a database that obviously does not contain any of those matching keys and I got the expected set of 2000 of these: {u'error': u'not_found', u'key': u'27b818b97c59390724a7e4ab58124b18e3313f711444686540'},

alfinkel commented 8 years ago

I suppose this a good and bad thing. The good is that all_docs with keys does appear to work in some form. The bad is that it does not work for you @bradwbonn and also that it looks like it might be difficult to replicate. In either case, I think the encoder still needs to be added and hopefully that resolves the problem.

bradwbonn commented 8 years ago

That might explain why at one point, I could have sworn it was working, but today when I tried the "new" method, it gave me the error. Something weird thisway comes... Would debug logging from the library tell us anything that might help?

ricellis commented 8 years ago

The suggested addition of the encoder has been done as part of #170 / #185 commit ec369d597c5152091546111f5090dd1e6f326a67

ricellis commented 8 years ago

The encoder doesn't appear to fix this problem it looks like we're missing a Content-Type header when we POST the list of keys. Not sure why it works in some cases e.g. https://github.com/cloudant/python-cloudant/issues/177#issuecomment-223103966.

ricellis commented 8 years ago

The difference in behaviours appears to be down to the server side handling of the POST content. It appears that CouchDB 1.6 running local for example is happy to proceed without the Content-Type header; just treating the POSTed keys content as JSON. On the other hand, the load balancers in front of the Cloudant service will reject the request with a 415 if the Content-Type: application/json header is not present. The fix is to make sure we add the Content-Type header to the request.