cloudant / python-cloudant

A Python library for Cloudant and CouchDB
Apache License 2.0
163 stars 55 forks source link

query returns docs + "no matching index found" message when index exists #198

Closed mikebroberg closed 8 years ago

mikebroberg commented 8 years ago
import cloudant

client = cloudant.Cloudant("username", "password", account="account_name")
client.connect()
session = client.session()
db = client['testdb']

query1 = cloudant.query.Query(db,selector={'$or':[{'userid': 35916}, {'userid': 11035}]},fields=['_id', 'userid'])
#query2 = cloudant.query.Query(db,selector={'_id':{'$gt':0},'$or':[{'userid': 35916}, {'userid': 11035}]},fields=['_id', 'userid'])

for doc in query1.result:
    print doc

client.disconnect()

As expected, this snippet returns:

{u'_id': u'266f9caae40012a04ce9223ccc67c8bd', u'userid': 35916}
{u'_id': u'5ff2cb156d16783492adb3eb8e2e0aec', u'userid': 11035}

Contrast that with what I thought would be analogous code:

import cloudant

client = cloudant.Cloudant("username", "password", account="account_name")
client.connect()
session = client.session()

database = client['testdb']
resp = database.get_query_result(
            selector={'$or': [{'userid': 35916}, {'userid': 11035}]},
            fields=['_id', 'userid'],
            raw_result=True
        )
# Print response from querying the `userid` index
print (resp)

client.disconnect()

Which, unexpectedly, returns this:

{u'docs': [{u'_id': u'266f9caae40012a04ce9223ccc67c8bd', u'userid': 35916}, {u'_id': u'5ff2cb156d16783492adb3eb8e2e0aec', u'userid': 11035}], u'warning': u'no matching index found, create an index to optimize query time'}

Finally, my Cloudant Query index, which I created through the Cloudant dashboard:

{
  "index": {
    "fields": [
      "userid"
    ]
  },
  "type": "json"
}

I'm pretty sure I'm using the get_query_result database helper method correctly. I'm not sure why I'm getting the warning. Thanks.

alfinkel commented 8 years ago

FWIW - your first approach iterates over a QueryResult which is a wrapper to the raw response which steps through the resp['docs'] list and ignores anything else in the response. While your second approach yields the entire raw response (raw_result=True) which consists of the docs list and that warning. So I am pretty certain that your first approach actually has that warning as well in the original response, you just can't see it. So the behaviors of your two approaches are likely the same. To see the raw response for your first approach, rather than using query1.result you could just call query1 directly since it is callable and is what query1.result calls and wraps as a QueryResult.

In approach one, replacing:

for doc in query1.result:
    print doc

with:

print query1()

should be the same as print (resp) in your second approach.

Similarly, if you flipped your second approach to using raw_result=False (which is the default) you would then have a QueryResult wrapped object as you do in approach one.

Unfortunately that does not explain the warning but I think it should at least address your seemingly different results.

mikebroberg commented 8 years ago

Yeah, thanks. I understand that they are different objects. I'm more concerned about the warning.

To follow up on that, I replicated the testdb to a different Cloudant multi-tenant account. Instead of the internal-only BigBlue cluster, I tried this code against Malort. query2 worked, but query1 barfed entirely, producing this:

michaels-air:Desktop broberg$ python cq_test.py 
Traceback (most recent call last):
  File "cq_test.py", line 12, in <module>
    for doc in query1.result:
  File "//anaconda/lib/python2.7/site-packages/cloudant/result.py", line 344, in __iter__
    **self.options
  File "//anaconda/lib/python2.7/site-packages/cloudant/query.py", line 183, in __call__
    resp.raise_for_status()
  File "//anaconda/lib/python2.7/site-packages/requests/models.py", line 844, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://mikebroberg.cloudant.com/testdb/_find
michaels-air:Desktop broberg$ 

It reproduces the error for the user on http://stackoverflow.com/questions/38557485/cloudant-query-slow-speed-using-or-or-in . I'm guessing it has something to do with the different versions of Cloudant running on Malort vs. BigBlue?

alfinkel commented 8 years ago

Yeah, that would be my guess as well. I think at the end of the day, it makes sense to identify whether this issue is a python-cloudant library issue or a deeper bug in Cloudant. I suspect, but may be wrong, that it may be the latter since the library only provides a very thin wrapper around python requests.

I think @emlaver is working on confirming this by issuing the same query using a direct request to the _find endpoint via python requests. I might even suggest a curl.

emlaver commented 8 years ago

I've tested using python requests, query method database.get_query_result, and curl on my jenever and meritage cluster accounts. Results for each of these tests:

Jenever account:

Meritage account:

Python requests sample code:

client = cloudant.Cloudant("username", "password", account="account_name")
client.connect()
database = client['userid-test-db']
params = {'selector': {'$or': [{'userid': 100}, {'userid': 200}]}, 'fields': ['userid']}
headers = {'Content-Type': 'application/json'}
resp = self.client.r_session.post(
            '/'.join([database.database_url, '_find']),
            headers=headers,
            data=json.dumps(params, cls=self.client.encoder)
)
print resp.json()

Curl sample command:

curl -X POST -H 'Content-Type: application/json' -d '{"selector": {"$or": [{"userid": 100}, {"userid": 200}]}, "fields": ["userid"]}' https://account:pass@account.cloudant.com/userid-test-db

From these results, I will close this item as it's not a python library issue and will open up a dbcore issue in FB.