Closed ghost closed 6 years ago
Hi @CWX-iggy, thanks for your issue.
Unfortunately, you need to know that the $in
operator has to do a full scan on every query on every document in the index. This cannot be pre-determined given the way Mango builds indexes. Mango will index the full contents of the field you are searching on (here, league
) but has to at query time do a manual comparison of the contents of that index, per-document, with your parameters at query time. You will experience similar poor performance with other inexact operators, such as the $regex
operator. @tonysun83 can you confirm my understanding of Mango indexing here?
I agree this is not clearly called out in the documentation, and for that I apologise. Would you agree with me converting your issue to one that would let us improve our documentation on this point?
Your choices to improve performance are to use Mango partial indexing to reduce the number of documents in your index (best option), use $eq
or another exact operator instead of $in
(next best, but you may not be able to capture your query, or try something like Clouseau/Dreyfus for full text search capability instead (hardest to set up, requires JDK6 and Apache Lucene, but is more flexible in terms of partial term searching). This last option lets you use text-based Mango indexes if you want to stay with Mango.
I recommend starting with partial indexing. You can use this to make multiple sub-indexes that reduce the likelihood of few documents being found. Partial indexes are especially good to filter documents by type, for which it looks like your meta.type
or league
fields might be good candidates. You may even find that running 2 mango queries (for instance, across 2 league
s) across 2 partial indexes and then merging the results in your client / app layers will be faster than running the single query across the entire content database.
One other thing I noticed - your query may not be using the right index. Can you check with /db/_explain
that the index is indeed being used? I don't see you indexing the field you're querying against here, i.e. league
.
Documentation on the explain endpoint is here: http://docs.couchdb.org/en/2.2.0/api/database/find.html#db-explain
@wohali I suspect that may the issue - no index on league (or priority). How do I specify multiple use_index
? This doesn't seem to work:
{..."use_index": ["_design/filter_by_league","_design/sort_by_date"]}
This is the response from /db/_explain
{
"error": "error",
"reason": "{invalid_object,{<<\"\ufffd\ufffd\ufffd\ufffd\">>}}"
}
Fauxton simply reports error fetching query plan.
You can't. Mango underneath is using the same index file format as MapReduce - you're still querying only a single index at once. You'll need to index all the fields you need in a single index.
That's weird on your /db/_explain response, you may need to upgrade to 2.2.0 for some bugfixes. Or I just realized - you might be trying to send it the query with a list-based use_index
, which is invalid as I explained.
Closing as this isn't strictly a Mango issue.
Happy to keep discussing it with you in this ticket if you like.
@wohali is right that the query that takes long is doing a full index scan on _all_docs
as you can see from the stats:
"execution_stats": {
"total_keys_examined": 0,
"total_docs_examined": 24856,
"total_quorum_docs_examined": 0,
"results_returned": 0,
"execution_time_ms": 34390.737
}
The one that is faster is because it uses the "publishTime"
index. Mango needs a way to know to select that index on your queries. The best way to do that is to add a publishTime into your selector even if its something like publishTime: {$gt: null}
. That way it should always select that index. You can also add in the use_index
to specify that index.
But running _explain on your queries will give you some information around what is happening internally and which index is being selected.
Since you using league
in your query it might be worth adding that into the index as well even if you only use it with the $in selector. That way only the docs with a league field and a publishTime field will be in your index.
just to clarify why the index is being used in the first instance, "sort":[{"publishTime":"desc"}]
adds an implicit "publishTime": { "$exists": true }
to the "selector"
. If you're not specifying the sort
in subsequent queries, you could add the selector clause explicitly to allow the index to be used (as @garrensmith suggested above).
It sounds like the most performant option would be to create an index on ["league", "publishTime"]
and issue multiple requests - one for each league. You'd then need to merge/sort the results on the client.
_find
returns results within 400-800ms when documents match the selector, but takes > 25000ms when no documents match the selector.Expected Behavior
Expect result response to be at least as responsive when no documents match selector.
Current Behavior
Response takes >25000ms to return if no documents match the selector.
Possible Solution
Nil
Steps to Reproduce (for bugs)
Database contains 15k+ documents of athlete news.
Sample document:
This is the index: POST /{db}/_index
The query selector: POST /{db}/_find
league
selector to find all "MLB" news. If documents match, results are returned within 400-800ms.league
selector to find all "EPL" news. If no documents match, an empty result is returned after more than 25000ms.$eq
Context
This data is returned to a frontend client directly via a websocket request and affects the app's responsiveness.
Your Environment