NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Translator Performance Issues: enthesitis – gene – gene – gene – scalp disorder (Slow) as latest example #53

Closed karafecho closed 1 year ago

karafecho commented 1 year ago

Various issues with Translator performance have been noted in multiple forums (QotM, Standup) and by individual users. The Architecture Committee has (informally) agreed to take on the task of dissecting the performance issues, triaging them, and (eventually) resolving them. However, I don't believe we have assigned a specific person or team to tackle the dissection. Hence, this issue.

Example 1: https://github.com/NCATSTranslator/Feedback/issues/41 Example 2: https://github.com/NCATSTranslator/testing/issues/230 Example 3: ARS – ARAX UI behavior related to asynchronous queries and responses that appear to be complete but aren’t

sierra-moxon commented 1 year ago

https://arax.ncats.io/?r=1994cac1-f36e-47a0-b67a-5b672aca0dfd

sierra-moxon commented 1 year ago

Noting: ARAX does return results, others are a mix of 500, 404, 501, 504 errors.

andrewsu commented 1 year ago

The result set in @sierra-moxon's post corresponds to this query:

{
    "message": {
        "query_graph": {
            "edges": {
                "e0": {
                    "object": "ent",
                    "subject": "gene1"
                },
                "e1": {
                    "object": "gene2",
                    "subject": "gene1"
                },
                "e2": {
                    "object": "gene3",
                    "subject": "gene2"
                },
                "e3": {
                    "object": "sd",
                    "subject": "gene3"
                }
            },
            "nodes": {
                "ent": {
                    "ids": ["MONDO:0024419"]
                },
                "gene1": {
                    "categories": ["biolink:Gene"]
                },
                "gene2": {
                    "categories": ["biolink:Gene"]
                },
                "gene3": {
                    "categories": ["biolink:Gene"]
                },
                "sd": {
                    "ids": ["MONDO:0044999"]
                }
            }
        }
    }
}

It looks like MONDO:0044999 (presumably for "scalp disorder") is not currently a valid MONDO ID (https://www.ebi.ac.uk/ols/search?q=MONDO%3A0044999&ontology=mondo). (Looks like it was obsoleted in https://github.com/monarch-initiative/mondo/commit/1904a7ab04a4378a2b85ee50cf70546a7dab7aed.) BTE finds no genes related to that MONDO ID, so it halts execution and returns zero results. If I replace MONDO:0044999 with MONDO:0006605 scalp dermatosis, BTE returns a non-zero result set, so I think on a structural/technical level, BTE is working as designed. (That result set is currently available at https://bte.transltr.io/v1/check_query_status/4dS89PiGRx. For some reason, BTE callbacks to the ARS are failing, so I can't post an ARAX link. I created a separate issue to investigate that...)

gglusman commented 1 year ago

This is a query I had tried for the PsO/PsA QoTM. The intent was: "can I find a gene that affects two other genes, one linked to enthesitis, the other to scalp disorder". In other words, phenotypeA <-- geneA <-- driverGene --> geneB --> phenotypeB.

sierra-moxon commented 1 year ago

from TAQA: Maybe we can find an example that is very slow to return the first result to provide an actionable item to start with for F&F release of UI.

karafecho commented 1 year ago

So, I'm not sure I can entirely replicate the issue I mentioned during today's TAQA call.

When I pose a question to the UI about drugs that may treat idiopathic bronchiectasis, the first eight results appear relatively quickly, but the "loading new results" button then spins for another six minutes before new results are available. Likewise, when I ask about drugs that may treat primary ciliary dyskinesia, the first 67 results appear pretty much immediately, but the "loading new results" button spins for another twelve minutes before new results are available.

However, the initial issue I recall is posing a question to the UI and not seeing any results at all for at least five minutes before I gave up. I thought the issue arose with a query related to idiopathic bronchiectasis or primary ciliary dyskinesia, but while my tests above suggest delays in returning new answers, they do not entirely replicate the initial issue as I recall it.

Somewhat related, I don't think there's an elegant way to stop a query that's been initiated in the UI? @Genomewide : is this something the UI team is thinking about?

karafecho commented 1 year ago

Also, the answers that are returned for primary ciliary dyskinesia are for ciliary motility disorders, even though I selected "primary ciliary dyskinesia" from the drop-down menu as the disease of interest.

(I realize I'm conflating issues here ... sorry! .. I just want to quickly capture them as I am short on time right now!)

sierra-moxon commented 1 year ago

summary of status:

sierra-moxon commented 1 year ago

the status of performance issues is as follows:

And I think that is where we stand for September. @Genomewide @karafecho - taking these updates into account, should we close this issue? Else, do you have suggestions for how to proceed? :) thx!

karafecho commented 1 year ago

I just retested idopathic bronchiectasis (IB) and primary ciliary dyskinesia (PCD). I am still finding parsing issues / mismatches for PCD (e.g., pomalidomide, cyclophosphamide and dexamethasone (PCD); paraneoplastic cerebellar degeneration (PCD)), and I'm finding a bunch of generic results for IB (e.g., antibiotics, steroids), but I think we can say that the performance issues have been addressed. For instance, the "Calculating" notice appears now, and initial results are returned in <2 minutes.

The phenotypeA <-- geneA <-- Gene --> geneB --> phenotypeB query is not directly relevant to the September release, and ARAX and BTE return results for the query, so I'm closing this issue.