Dev data service returning 504s on small/empty data sets

jim-sheldon commented 3 years ago

The data service returns 504s when applying the country filter to small or empty data sets, including Kenya, Taiwan, Monaco, Cambodia, Nepal, and Mozambique. It does not happen with larger data sets, like Germany, Brazil, France, or the United States. It also does not happen in the prod environment.

iamleeg commented 3 years ago

Hmm. I ran the query db.cases.find({'location.country': 'ghana'}) and it was really slow. I then ran db.cases.find({'location.country': 'ghana'}).explain({verbosity: 'executionStats'}) which should run the same query to completion before reporting, and it was also slow:

MongoDB Enterprise cluster-0-shard-0:PRIMARY> db.cases.find({'location.country':'ghana'}).explain({verbosity: 'executionStats'})
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "covid19.cases",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "location.country" : {
                "$eq" : "ghana"
            }
        },
        "winningPlan" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "location.country" : {
                    "$eq" : "ghana"
                }
            },
            "direction" : "forward"
        },
        "rejectedPlans" : [ ]
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 0,
        "executionTimeMillis" : 195892,
        "totalKeysExamined" : 0,
        "totalDocsExamined" : 39264034,
        "executionStages" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "location.country" : {
                    "$eq" : "ghana"
                }
            },
            "nReturned" : 0,
            "executionTimeMillisEstimate" : 99032,
            "works" : 39264036,
            "advanced" : 0,
            "needTime" : 39264035,
            "needYield" : 0,
            "saveState" : 39503,
            "restoreState" : 39503,
            "isEOF" : 1,
            "direction" : "forward",
            "docsExamined" : 39264034
        },
        "allPlansExecution" : [ ]
    },
    "serverInfo" : {
        "host" : "cluster-0-shard-00-01-vwhx6.mongodb.net",
        "port" : 27017,
        "version" : "4.4.8",
        "gitVersion" : "83b8bb8b6b325d8d8d3dfd2ad9f744bdad7d6ca0"
    },
    "ok" : 1,
    "$clusterTime" : {
        "clusterTime" : Timestamp(1629732226, 1),
        "signature" : {
            "hash" : BinData(0,"piKWDwLDv7FRcnwCe51PZDLR4UM="),
            "keyId" : NumberLong("6958739380580122625")
        }
    },
    "operationTime" : Timestamp(1629732226, 1)
}

This might surprise nobody, but it at least means that it isn't something like "dev mongodb occasionally goes to sleep and needs to wake up" or "dev mongodb runs on spinning disks that occasionally spin down". The big observation here is that this is a collection scan, it doesn't use an index. That is why this is slow.

iamleeg commented 3 years ago

We have both a text index and a search index for location.country so I don't know why the query doesn't use an index? But it doesn't.

iamleeg commented 3 years ago

I tried hinting that it should use the index and it didn't change the plan or (unsurprisingly) the execution time.

iamleeg commented 3 years ago

I've asked about the index on stack overflow.

iamleeg commented 3 years ago

Suggestion was to make the collation on the query match the collation in the index, i.e. { locale: "en_US", strength: 2 }. Doing this brings the query time down to 9ms! Let me see how to add that to the app.

iamleeg commented 3 years ago

OK, full result of a few days of exploration: the problem is certainly that our indexes do not support our users' queries, so mongo is having to do a collection scan (i.e. iterate over every document) and this is slow. We can add a compound index on e.g. (country, date) to speed up filtering by country. This speeds up retrieval of both the list and download endpoints, but download still happens slowly and I think that's possibly due to the loop that transforms the documents into the target format (e.g. csv).

However, my hypothesis was that creating a single compound index with all of the filter query fields would give us indexed searches for any filter. This didn't happen, and mongodb fell back to doing full collection scans. Therefore we need to log what queries users are making in prod and design indexes to support those queries. In effect we have some of that information from the query time reports in atlas, though because we can't correlate queries to users we can't see whether they were genuine, or someone testing something. Note that I'm not saying we need to associate queries with individual users! But we do need to be confident that they're real things real people are trying to do, and we're not optimising for developers or testers.

BTW another piece of information that will be helpful is the indexStats for a collection, which tells us how frequently an index is used. I guess most of the indexes we have aren't helping any of our queries and can be removed, saving storage and speeding up writes. However I don't have permission on prod to run db.cases.aggregate([{$indexStats:{}}]) so can't test this hypothesis.

abhidg commented 3 years ago

Fixed for small datasets with country search by adding compound index with creation date.

globaldothealth / list

Dev data service returning 504s on small/empty data sets #2111