biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

investigate differences in results between sync and async queries #313

Closed andrewsu closed 3 years ago

andrewsu commented 3 years ago

The workflow progress tracker ran Query B.2a: https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowB/B.2a_DILI-fourth-one-hop-from-CHEBI_41879_Dexamethasone.json via both a synchronous query (https://arax.ncats.io/?r=ae038a8a-e0dc-44a5-962d-efbdead8eae0 (2817 results)) and via async query (https://arax.ncats.io/?r=1ef52ad6-a081-4b98-bb39-147c65e92982 (1609 results)), within a few hours of each other. The number of results returned differs between these two calls. I suspect it has to do with a timeout/availability issue, but as we shift toward pushing the use of async over sync, would just like to confirm that there is isn't some difference in logic that slipped through...

tokebe commented 3 years ago

Further testing, on my local I consistently receive 2083 results from both endpoints, whereas currently on prod I receive 1113 on /query and 790 on /asyncquery, with tests for either endpoint being run within minutes of each other.

andrewsu commented 3 years ago

@tokebe Can you see in the logs any obvious differences in which APIs are being called, or the number of results being returned?

tokebe commented 3 years ago

What immediately jumps out is that two queries to https://mychem.info/v1/query are failing on the async endpoint with an error 'getaddrinfo ENOTFOUND mychem.info`, which is a very odd error to say the least. These account for all missing results from async, going by the number of hits on the sync endpoint.

andrewsu commented 3 years ago

Just a note that this query https://github.com/NCATSTranslator/testing/blob/main/ars-requests/not-none/1.2/D.1-hgd-alkaptonuria.json resulted in the same error from https://api.bte.ncats.io/v1/asyncquery/ (https://api.bte.ncats.io/v1/check_query_status/kVmlkzvxkc):

{
  "id": "kVmlkzvxkc",
  "state": "completed",
  "returnvalue": {
    "response": {
      "error": "Error",
      "message": "getaddrinfo ENOTFOUND mychem.info"
    },
    "callback": ""
  },
  "progress": 0
}

Results returned fine from the sync endpoint https://api.bte.ncats.io/v1/query/

newgene commented 3 years ago

Probably a DNS issue for the instance we route asyncquery over.

zcqian commented 3 years ago

DNS issue should have been fixed (one way or another), the following domains are resolvable and works within the VPC:

Let me know if more are needed and I will fix that.

newgene commented 3 years ago

Verified the instance running bte async queries can resolve these biothings API hostnames now.

@andrewsu if you can verify the asyncquery results are the same as those from the sync query with your test queies, we can close this issue now.

tokebe commented 3 years ago

@andrewsu @newgene Confirmed, both sync and async are both returning 2414 results, with no subqueries failing due to either ENOTFOUND nor timeouts.

andrewsu commented 3 years ago

I tested https://github.com/NCATSTranslator/minihackathons/blob/main/2021-12_demo/workflowB/B.2a_DILI-fourth-one-hop-from-CHEBI_41879_Dexamethasone.json on both the sync and async endpoint (on the prod instance), and I also get the same number of results (1441). (I'll note that in my testing I did get one response on the sync endpoint with a smaller number of results, but that appears to be due to a timeout on one of the APIs. And I've rerun it on sync again and gotten the expected number.) So closing this issue!