NCATSTranslator / workflow-runner

1 stars 1 forks source link

500 Error on Disease-Tissue 1-hop #64

Closed GregHydeDartmouth closed 8 months ago

GregHydeDartmouth commented 8 months ago

I am attempting to run a simple 1-hop (the first hop to workflow B for Clinical Data Committee) with a boiled down workflow. The json is as follows:

{
    "workflow": [
        {
            "id": "lookup"
        }
    ],
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": [
                        "biolink:Disease"
                    ],
                    "ids": [
                        "EFO:0000519"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:GrossAnatomicalStructure"
                    ]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": [
                        "biolink:located_in"
                    ]
                }
            }
        }
    },
    "log_level": "DEBUG"
}

I send this to the WFR as follows:

r = requests.post('https://translator-workflow-runner.ci.transltr.io/query', json=message)

This results in a 500 error at all deployment instances (dev, staging, test, prod)

@maximusunc indicated this is likely because someone is handing back invalid TRAPI

GregHydeDartmouth commented 8 months ago

Curious. WFR appears to run fine on the following query that is identical to my previous query except that I specify an allowlist with only aragorn:

{
    "workflow": [
        {
            "id": "lookup",
            "runner_parameters": {
                "timeout": 300,
                "allowlist": [
                    "infores:aragorn"
                ]
            }
        }
    ],
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": [
                        "biolink:Disease"
                    ],
                    "ids": [
                        "EFO:0000519"
                    ]
                },
                "n1": {
                    "categories": [
                        "biolink:GrossAnatomicalStructure"
                    ]
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": [
                        "biolink:located_in"
                    ]
                }
            }
        }
    },
    "log_level": "DEBUG"
}

This results in a 200 from WFR on all instances with knowledge graphs and results graphs as I would expect However I wanted to note that on the dev instance with the following call:

r = requests.post('https://translator-workflow-runner.renci.org/query', json=message, timeout=1000)

I get the following warning back from WFR:

{
      "timestamp": "2023-10-20T19:08:17.553292",
      "level": "WARNING",
      "error": "Failed to get a good response from Aragorn(Trapi v1.4.0), see the logs"
}

where I do not get a knowledge graph nor result graph. This behavior can be confirmed by pinging aragorn directly with the query above (removing the workflow) with the following call:

r = requests.post('https://aragorn.renci.org/aragorn/query', json=message, timeout=1000)

where I get a response back from aragorn with:

"logs": [
    {
      "timestamp": "2023-10-12T18:33:36.623338",
      "level": "INFO",
      "message": "pid: 6c52adeae130",
      "code": null
    },
    {
      "timestamp": "2023-10-12T18:33:36.623338",
      "level": "ERROR",
      "message": "strider HTML error status code 500 returned.",
      "code": null
    }
  ],
maximusunc commented 8 months ago

This has been fixed. Long story short, I think this was some sort of out-of-sync caching issue in Aragorn. I went in and fixed up the cache and it seems to be working now. I'm going to close this ticket and potentially open an issue in Aragorn to try and figure out how the cache got messed up.