Closed VersusF closed 6 years ago
I notice that your curl
query is using a limit
of 5000
, but that you don't have the same limit
applied to your python-cloudant query.
I wouldn't expect it to make a difference, but it would be good to know that we are comparing exactly the same query before digging in any deeper.
Thank you for the fast answer:
I added the limit parameter to the query, but in order to iterate on the results i had to call Query(...).result.all()
. It slowed down the total time to 3 sec (just like the curl
version) but maybe this is not the optimal solution. From profilehooks output now there are 57 calls to {method 'recv' of '_socket.socket' objects}
, with an average time of 0.054, so that's probably the point.
Anyway thank-you again for the hint and the support.
We normally expect queries to be executed via:
db.get_query_result(...)
There are some documented examples of iterating docs returned from it.
Note that the default page_size
is 100, and that skip
and limit
are set internally based on this to page results in a memory efficient way (at the expense of making more requests). If you are unconcerned about the memory usage of having more docs at once (especially because you are only using the _id
field) you can increase the page_size
, reducing the number of requests needed. I suspect that your use of the (internal) Query
class above used the limit=100
and explicitly setting the limit higher reduced the number of requests needed to fetch all the query results. It is better to use the db.get_query_result
as that's really the API for this, for your case something like this should work:
docs = db.get_query_result(selector, fields=["_id"], page_size=5000, use_index="78efbd1fbc0663b7953309184e9c6b3b0c1ca965")
for doc in docs:
# count
This is probably the solution. I did not know about get_query_result
method.
changing page_size
i actually change the query time too.
Thank you again for the support, I think now this issue can be deleted.
I'm trying to execute a query, but it takes more than a minute using Query class. The same query is instead very fast (2-3 sec) if executed with curl
curl code:
python query code:
Then I do a for-in-loop for each document (~3000) and increment a counter. This cycle is the most time-consuming, but profilehooks tells me that most of the time is spent on
{method 'recv' of '_socket.socket' objects}
I actually do another query with your library but it's very fast. Hope you can solve, thank you for all the brilliant work done.