Closed jim-sheldon closed 3 years ago
Hmm. I ran the query db.cases.find({'location.country': 'ghana'})
and it was really slow. I then ran db.cases.find({'location.country': 'ghana'}).explain({verbosity: 'executionStats'})
which should run the same query to completion before reporting, and it was also slow:
MongoDB Enterprise cluster-0-shard-0:PRIMARY> db.cases.find({'location.country':'ghana'}).explain({verbosity: 'executionStats'})
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "covid19.cases",
"indexFilterSet" : false,
"parsedQuery" : {
"location.country" : {
"$eq" : "ghana"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"location.country" : {
"$eq" : "ghana"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 0,
"executionTimeMillis" : 195892,
"totalKeysExamined" : 0,
"totalDocsExamined" : 39264034,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"location.country" : {
"$eq" : "ghana"
}
},
"nReturned" : 0,
"executionTimeMillisEstimate" : 99032,
"works" : 39264036,
"advanced" : 0,
"needTime" : 39264035,
"needYield" : 0,
"saveState" : 39503,
"restoreState" : 39503,
"isEOF" : 1,
"direction" : "forward",
"docsExamined" : 39264034
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "cluster-0-shard-00-01-vwhx6.mongodb.net",
"port" : 27017,
"version" : "4.4.8",
"gitVersion" : "83b8bb8b6b325d8d8d3dfd2ad9f744bdad7d6ca0"
},
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1629732226, 1),
"signature" : {
"hash" : BinData(0,"piKWDwLDv7FRcnwCe51PZDLR4UM="),
"keyId" : NumberLong("6958739380580122625")
}
},
"operationTime" : Timestamp(1629732226, 1)
}
This might surprise nobody, but it at least means that it isn't something like "dev mongodb occasionally goes to sleep and needs to wake up" or "dev mongodb runs on spinning disks that occasionally spin down". The big observation here is that this is a collection scan, it doesn't use an index. That is why this is slow.
We have both a text index and a search index for location.country
so I don't know why the query doesn't use an index? But it doesn't.
I tried hinting that it should use the index and it didn't change the plan or (unsurprisingly) the execution time.
Suggestion was to make the collation on the query match the collation in the index, i.e. { locale: "en_US", strength: 2 }
. Doing this brings the query time down to 9ms! Let me see how to add that to the app.
OK, full result of a few days of exploration: the problem is certainly that our indexes do not support our users' queries, so mongo is having to do a collection scan (i.e. iterate over every document) and this is slow. We can add a compound index on e.g. (country, date) to speed up filtering by country. This speeds up retrieval of both the list
and download
endpoints, but download
still happens slowly and I think that's possibly due to the loop that transforms the documents into the target format (e.g. csv).
However, my hypothesis was that creating a single compound index with all of the filter query fields would give us indexed searches for any filter. This didn't happen, and mongodb fell back to doing full collection scans. Therefore we need to log what queries users are making in prod and design indexes to support those queries. In effect we have some of that information from the query time reports in atlas, though because we can't correlate queries to users we can't see whether they were genuine, or someone testing something. Note that I'm not saying we need to associate queries with individual users! But we do need to be confident that they're real things real people are trying to do, and we're not optimising for developers or testers.
BTW another piece of information that will be helpful is the indexStats for a collection, which tells us how frequently an index is used. I guess most of the indexes we have aren't helping any of our queries and can be removed, saving storage and speeding up writes. However I don't have permission on prod to run db.cases.aggregate([{$indexStats:{}}])
so can't test this hypothesis.
Fixed for small datasets with country search by adding compound index with creation date.
The data service returns 504s when applying the
country
filter to small or empty data sets, including Kenya, Taiwan, Monaco, Cambodia, Nepal, and Mozambique. It does not happen with larger data sets, like Germany, Brazil, France, or the United States. It also does not happen in the prod environment.