biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

Investigate a long TRAPI query which tends to hang the production server #263

Closed newgene closed 2 years ago

newgene commented 3 years ago

This is the query captured from the log:


{
    "message": {
        "query_graph": {
            "nodes": {
                "n1": {
                    "categories": [
                        "biolink:DiseaseOrPhenotypicFeature"
                    ],
                    "is_set": false,
                    "ids": [
                        "MONDO:0005071",
                        "DOID:863",
                        "SNOMEDCT:118940003",
                        "SNOMEDCT:155262005",
                        "NCIT:C26835",
                        "UMLS:C0027765",
                        "MESH:D009422",
                        "MEDDRA:10013242",
                        "MEDDRA:10029202",
                        "MEDDRA:10029203",
                        "MEDDRA:10029205",
                        "MEDDRA:10029286",
                        "MEDDRA:10029298",
                        "MESH:C519298",
                        "MESH:C071458",
                        "MONDO:0002184",
                        "DOID:2044",
                        "SNOMEDCT:235889003",
                        "UMLS:C0524912",
                        "UMLS:C2717837",
                        "UMLS:C4505492",
                        "UMLS:C4505493",
                        "MESH:D056487",
                        "MESH:C040391",
                        "UMLS:C0262505",
                        "SNOMEDCT:235871004",
                        "MEDDRA:10019737",
                        "MEDDRA:10019741",
                        "MONDO:0013282",
                        "DOID:13372",
                        "OMIM:613490",
                        "ORPHANET:60",
                        "SNOMEDCT:30188007",
                        "NCIT:C84397",
                        "UMLS:C0221757",
                        "MESH:C531610",
                        "MESH:D019896",
                        "MEDDRA:10001806",
                        "MEDDRA:10001811",
                        "MONDO:0005366",
                        "SNOMEDCT:61977001",
                        "UMLS:C0524909",
                        "MESH:D019694",
                        "MEDDRA:10008910",
                        "UMLS:C1856453",
                        "MONDO:0000775",
                        "DOID:0060500",
                        "SNOMEDCT:416093006",
                        "SNOMEDCT:416098002",
                        "UMLS:C0013182",
                        "UMLS:C5139486",
                        "MESH:D004342",
                        "HP:0410323",
                        "MEDDRA:10013661",
                        "MEDDRA:10013700",
                        "MEDDRA:10082135",
                        "MESH:C518324",
                        "UMLS:C3276783",
                        "UMLS:C4231138",
                        "SNOMEDCT:237601000",
                        "UMLS:C0342271",
                        "MONDO:0018229",
                        "DOID:0050426",
                        "OMIM:142830",
                        "OMIM:608579",
                        "ORPHANET:36426",
                        "SNOMEDCT:403609001",
                        "SNOMEDCT:73442001",
                        "SNOMEDCT:768946000",
                        "NCIT:C79484",
                        "EFO:0004276",
                        "UMLS:C0038325",
                        "UMLS:C1274933",
                        "UMLS:C1837818",
                        "UMLS:C1840547",
                        "UMLS:C1840548",
                        "UMLS:C1969756",
                        "UMLS:C2608081",
                        "UMLS:C2750833",
                        "UMLS:C3277286",
                        "UMLS:C3658301",
                        "UMLS:C3658302",
                        "UMLS:C4016206",
                        "MESH:D013262",
                        "MEDDRA:10006561",
                        "MEDDRA:10015209",
                        "MEDDRA:10015211",
                        "MEDDRA:10015219",
                        "MEDDRA:10015220",
                        "MEDDRA:10042029",
                        "MEDDRA:10042030",
                        "MEDDRA:10042033",
                        "MEDDRA:10042849",
                        "MONDO:0005790",
                        "DOID:12549",
                        "SNOMEDCT:40468003",
                        "NCIT:C3096",
                        "UMLS:C0019159",
                        "MESH:D006506",
                        "MEDDRA:10019719",
                        "MEDDRA:10019780",
                        "MEDDRA:10019782",
                        "MEDDRA:10021913",
                        "MEDDRA:10047447",
                        "UMLS:C3278891",
                        "SNOMEDCT:235877000",
                        "UMLS:C0473117",
                        "MEDDRA:10023025",
                        "MEDDRA:10023040",
                        "MESH:C064613",
                        "MONDO:0005267",
                        "DOID:114",
                        "SNOMEDCT:194707003",
                        "SNOMEDCT:56265001",
                        "NCIT:C3079",
                        "EFO:0003777",
                        "UMLS:C0018799",
                        "UMLS:CN236661",
                        "UMLS:CN239852",
                        "MESH:D006331",
                        "MEDDRA:10007540",
                        "MEDDRA:10007541",
                        "MEDDRA:10013199",
                        "MEDDRA:10019276",
                        "MEDDRA:10019277",
                        "MEDDRA:10061024",
                        "CHEBI:59683",
                        "UMLS:C4049267",
                        "MEDDRA:10076955",
                        "MONDO:0043693",
                        "SNOMEDCT:41309000",
                        "NCIT:C34783",
                        "UMLS:C0023896",
                        "UMLS:C1442981",
                        "MESH:D008108",
                        "MEDDRA:10001626",
                        "MEDDRA:10001627",
                        "MEDDRA:10001628",
                        "MEDDRA:10019844",
                        "MESH:C110500",
                        "MONDO:0013433",
                        "DOID:0060643",
                        "OMIM:613806",
                        "ORPHANET:171",
                        "SNOMEDCT:197441003",
                        "UMLS:C0566602",
                        "MESH:C536419",
                        "MEDDRA:10036732",
                        "UMLS:C0455417",
                        "SNOMEDCT:266902008",
                        "MONDO:0005354",
                        "UMLS:C0524910",
                        "MESH:D019698",
                        "MEDDRA:10008912",
                        "MONDO:0005359",
                        "SNOMEDCT:197352008",
                        "SNOMEDCT:235876009",
                        "SNOMEDCT:427399008",
                        "NCIT:C84427",
                        "UMLS:C0019193",
                        "UMLS:C0860207",
                        "UMLS:C1262760",
                        "UMLS:C3658290",
                        "UMLS:C4277682",
                        "UMLS:C4279912",
                        "MESH:D056486",
                        "MEDDRA:10013705",
                        "MEDDRA:10013762",
                        "MEDDRA:10019766",
                        "MEDDRA:10019795",
                        "MEDDRA:10072268",
                        "MEDDRA:10072734",
                        "MEDDRA:10072937",
                        "UMLS:C1857414",
                        "UMLS:C0019699",
                        "SNOMEDCT:165816005",
                        "MEDDRA:10020180",
                        "MEDDRA:10020183",
                        "MEDDRA:10020188",
                        "MEDDRA:10020425",
                        "MEDDRA:10036219",
                        "NCIT:C15175",
                        "MESH:D006679",
                        "MONDO:0004335",
                        "DOID:77",
                        "SNOMEDCT:119292006",
                        "SNOMEDCT:53619000",
                        "NCIT:C2990",
                        "UMLS:C0012242",
                        "UMLS:C0017178",
                        "UMLS:C0559031",
                        "UMLS:C1565321",
                        "UMLS:C4023588",
                        "MESH:D004066",
                        "MESH:D005767",
                        "HP:0011024",
                        "MEDDRA:10013225",
                        "MEDDRA:10017876",
                        "MEDDRA:10017922",
                        "MEDDRA:10017944",
                        "MEDDRA:10017945",
                        "MEDDRA:10017947",
                        "MEDDRA:10071275",
                        "MONDO:0013209",
                        "DOID:0080208",
                        "SNOMEDCT:197315008",
                        "NCIT:C84444",
                        "UMLS:C0400966",
                        "MESH:D065626",
                        "MEDDRA:10029530",
                        "MEDDRA:10082249",
                        "UMLS:C0455540",
                        "SNOMEDCT:161523006",
                        "MESH:C093154",
                        "UMLS:C4554323",
                        "NCIT:C143255",
                        "UMLS:C0149709",
                        "SNOMEDCT:165806002",
                        "MEDDRA:10019739",
                        "MEDDRA:10019740",
                        "MEDDRA:10019742",
                        "MESH:C069356",
                        "MESH:C115528",
                        "MONDO:0007745",
                        "DOID:2739",
                        "OMIM:143500",
                        "SNOMEDCT:27503000",
                        "NCIT:C84729",
                        "UMLS:C0017551",
                        "MESH:D005878",
                        "MEDDRA:10018267",
                        "MONDO:0001475",
                        "DOID:1227",
                        "SNOMEDCT:191336001",
                        "SNOMEDCT:303011007",
                        "UMLS:C0027947",
                        "MESH:D009503",
                        "MEDDRA:10029354",
                        "MEDDRA:10029355",
                        "UMLS:C0948251",
                        "MEDDRA:10052022",
                        "UMLS:C2674487",
                        "UMLS:C0022346",
                        "UMLS:C0242183",
                        "SNOMEDCT:18165001",
                        "SNOMEDCT:60217008",
                        "HP:0000952",
                        "MEDDRA:10021207",
                        "MEDDRA:10023126",
                        "MEDDRA:10023132",
                        "MEDDRA:10023135",
                        "MEDDRA:10023139",
                        "NCIT:C3143",
                        "NCIT:C35299",
                        "MESH:D007565",
                        "MESH:D000081226"
                    ]
                },
                "n2": {
                    "categories": [
                        "biolink:Gene"
                    ],
                    "is_set": false
                }
            },
            "edges": {
                "e02": {
                    "subject": "n2",
                    "object": "n1",
                    "predicates": [
                        "biolink:gene_associated_with_condition"
                    ]
                }
            }
        }
    }
}

When tested locally, this query can return results in 1044855ms, with many log msgs like below:

14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018646 +1ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018646 +0ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018646 +0ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018646 +0ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018646 +0ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018646 +0ms
14:15:25 0|bte-trapi  |   bte:call-apis:query After transformation, BTE is able to retrieve 156 hits! +1s                                                              14:15:25 0|bte-trapi  |   bte:call-apis:query Succesfully made the following query: {"url":"https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0018229/genes","params":{"direct":true,"rows":200,"unselect_evidence":true},"method":"get","timeout":50000} +288ms                                                                  14:15:25 0|bte-trapi  |   bte:api-response-transform:index api name BioLink API +2s
14:15:25 0|bte-trapi  |   bte:api-response-transform:index api tags: anatomy,disease,gene,phenotype,pathway,annotation,query,translator,biolink +0ms                   14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +296ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +8ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +5ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +4ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +6ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +6ms
14:15:25 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +4ms
14:15:26 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +6ms
14:15:26 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +6ms
14:15:26 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +7ms
14:15:26 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +9ms
14:15:26 0|bte-trapi  |   bte:api-response-transform:transformer input: MONDO:0018229 +6ms
newgene commented 3 years ago

tested on our dev instance, with redis enabled (~4G memory total), it fails at the step of setting cache:

image

tokebe commented 3 years ago

Some further detail: the execution time of the query would not appear to be a factor in whether the query hangs the server.

This query took the server 1,008,276ms, but in my testing concurrent queries were not significantly slowed:

click to expand ```json { "message": { "query_graph": { "nodes": { "n0": { "ids": [ "EFO:0004228", "MESH:D056486", "MONDO:0005359", "NCIT:C84427", "UMLS:C0019193", "UMLS:C0860207", "UMLS:C1262760", "UMLS:C3658290", "UMLS:C4277682", "UMLS:C4279912" ], "categories": [ "biolink:DiseaseOrPhenotypicFeature", "biolink:Disease", "biolink:PhenotypicFeature" ], "is_set": true }, "n1": { "categories": [ "biolink:DiseaseOrPhenotypicFeature", "biolink:Disease", "biolink:PhenotypicFeature" ], "is_set": true } }, "edges": { "e01": { "predicates": [ "biolink:predisposes" ], "subject": "n0", "object": "n1" } } } } } ```

Meanwhile, this query took 343,514ms (less overall time), but concurrent queries were significantly slowed:

click to expand ```json { "message": { "query_graph": { "nodes": { "n0": { "ids": [ "MONDO:0005359", "MONDO:0001942", "MONDO:0005011", "MONDO:0005148", "MONDO:0005227", "MONDO:0015517", "MONDO:0005146", "MONDO:0005420", "MONDO:0021100", "MONDO:0002959", "MONDO:0003538", "MONDO:0004670", "MONDO:0007140", "MONDO:0016264", "MONDO:0019632", "MONDO:0001356", "MONDO:0001650", "MONDO:0001711", "MONDO:0005269", "MONDO:0006011", "MONDO:0001505", "MONDO:0005044", "MONDO:0005301", "MONDO:0007027", "MONDO:0010030", "MONDO:0018874", "MONDO:0019060", "MONDO:0002771", "MONDO:0004721", "MONDO:0005061", "MONDO:0005247", "MONDO:0005279", "MONDO:0005300", "MONDO:0005311", "MONDO:0007028", "MONDO:0007256", "MONDO:0010811", "MONDO:0005083", "MONDO:0005249", "MONDO:0006763", "MONDO:0008315", "MONDO:0021068", "MONDO:0005485", "MONDO:0002280", "MONDO:0002471", "MONDO:0005002", "MONDO:0013662", "MONDO:0002050", "MONDO:0004967", "MONDO:0004981", "MONDO:0005101", "MONDO:0007186", "MONDO:0009410", "MONDO:0011849", "MONDO:0011996", "MONDO:0015925", "MONDO:0005244", "MONDO:0001881", "UMLS:C0348754", "MONDO:0005027", "MONDO:0004900", "MONDO:0002049", "MONDO:0003019", "MONDO:0021187", "MONDO:0005299", "MONDO:0005277", "MONDO:0009693", "MONDO:0005296", "MONDO:0001628", "MONDO:0005804", "MONDO:0015691", "MONDO:0011918", "MONDO:0000328", "UMLS:C0153886", "HP:0002017", "HP:0002910", "HP:0012378", "HP:0012418", "HP:0100512", "HP:0003119", "HP:0011897", "HP:0020062", "HP:0002923", "HP:0012416", "HP:0020059", "HP:0025406", "HP:0031506", "HP:0032238", "HP:0032312", "HP:0200119", "HP:0200123", "HP:0000969", "HP:0003155", "HP:0003326", "HP:0025181", "HP:0031808", "HP:0031956", "HP:0031964", "HP:0012417", "HP:0011423", "HP:0031890", "HP:0032263", "HP:0011227", "HP:0012213", "HP:0033359", "HP:0001260", "HP:0001649", "HP:0003073", "HP:0032198", "HP:0000958", "HP:0001407", "HP:0001824", "HP:0002904", "HP:0003645", "HP:0005537", "HP:0011016", "HP:0012735", "NCIT:C159516", "UMLS:C0428465" ], "categories": [ "biolink:Disease" ] }, "n1": { "categories": [ "biolink:Gene" ] } }, "edges": { "e01": { "subject": "n0", "object": "n1", "predicates": [ "biolink:prevented_by" ] } } } } } ```

Further comparison of response time for concurrent queries between these two shows that the response time while handling the first query goes down while the server is transforming API responses -- it's almost definitely something in api-response-transform causing this. I'm taking a look for anything that might be causing synchronous execution there now.

colleenXu commented 3 years ago

@newgene @andrewsu @tokebe I believe the queries here are not the type of query that we should expect from Translator, and we may not need to support them. I suggest a discussion over who is sending these queries and what they are trying to do.

They are likely using too many IDs in the query (and the predicate restriction, while reasonable, may not get answers or what you expect since we may not have data connected with those predicates).

Discussions in Translator have been happening around limits on this like the number of IDs that can be in a query under 1 node...

tokebe commented 3 years ago

I think the issue highlighted by these queries is still well worth attempting to track down and to fix. It appears that any query which could result in a larger quantities of response transformation can cause server response time to increase.

Another thing worth noting is that I never saw a query cause significant memory usage issues on my local -- I suspect that hard crashes are being caused by high CPU usage and processes becoming unresponsive, though memory usage is rather high for some queries, likely due to excessive object cloning. I haven't looked into the dev or prod instances, however, so I could be entirely off the mark there.

tokebe commented 3 years ago

@newgene I think this issue has been adequately addressed/split into more specific performance issues, recommending we close it.