ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

"nan" returned in ICEES API output #235

Closed karafecho closed 2 years ago

karafecho commented 2 years ago

This issue is to report that while "NaN" no longer appears to be returned in ICEES API output, "nan" (in lower case) is still being returned.

karafecho commented 2 years ago

From Hong: I believe "nan" is returned as a string in JSON output then, which is indeed a valid JSON output. The problem with "Cannot parse JSON" is that we have NaN included as a null (i.e., not a string), which has been correctly converted to null to fix the "cannot parse JSON" error. I cannot find in the icees api code any "nan" string included in response output. I think we can either keep it as is or found the source that created this "nan" string in the response and fix the source.

kennethmorton commented 2 years ago

I believe that sometimes it is possible that some calculations lead to numeric nan values. These get converted to JSON improperly by the JSON encoder. Currently we use the encoder provided by fastapi that does not appear to be nan aware. I believe we should change to an alternative encoder that is nan aware, such as orjson.

Start here and this should be a drop in replacement.

karafecho commented 2 years ago

I am a bit confused. Hong corrected the NaN issue and redeployed. I tested the KG endpoints using prior queries that returned the JSON parsing issue, and the queries appeared to return valid JSON, although it is possible I only tested the NaN issue. In fact, I do recall finding nan in ICEES output for non-KG queries, but it did not cause any issues. As such, Hong and I considered the JSON parsing issue resolved.

Here's an example query that appears to be working but was returning a JSON parsing error before Hong's fix:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicates": [
            "biolink:has_real_world_evidence_of_association_with"
          ]
        }
      },
      "nodes": {
        "n00": {
          "ids": [
            "MONDO:0009061"
          ],
          "categories": [
            "biolink:DiseaseOrPhenotypicFeature"
          ]
        },
        "n01": {
          "categories": [
            "biolink:DiseaseOrPhenotypicFeature"
          ]
        }
      }
    },
    "knowledge_graph": {
      "nodes": {},
      "edges": {}
    },
    "results": []
  }
}

Perhaps you can test this?

karafecho commented 2 years ago

Here is a CURL that returns "nan" but without issue:

curl -X 'POST' \
  'https://icees-pcd-dev.apps.renci.org/patient/cohort/COHORT%3A3/associations_to_all_features' \
  -H 'accept: text/tabular' \
  -H 'Content-Type: application/json' \
  -d '{
  "feature": {
    "TotalEDInpatientVisits": {
      "operator": "=",
      "value": "0"
    }
  },
  "maximum_p_value": 1,
  "correction": {
    "method": "bonferroni",
    "value":0.05
  }
}'
hyi commented 2 years ago

@kennethmorton Thanks for the suggestion for replacing openapi json encoder with orjson to take care of NaN in the encoder. In fact, we already have a NaNResponse class here which tries to replace NaN with null in the json response in the codebase. The fix I put in to fix this "Cannot parse JSON" problem was to fix this NaNResponse replace statement since the search string had a comma in it which was not right. I think for now, we should be good, but if we run into further issue related to this, we may switch to use orjson. @karafecho

hyi commented 2 years ago

It turns out this nan issue happens when selecting text/tabular format type which is a different issue from the "Cannot parse JSON" issue I fixed earlier since the result is tabulated rather than JSON encoded. The PR I just created will fix this issue. Once @karafecho get a chance to test and verify, this issue can be closed.

karafecho commented 2 years ago

Fix was tested and confirmed. Issue can be closed.