ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

'Cannot parse JSON' error #225

Closed karafecho closed 2 years ago

karafecho commented 2 years ago

This issue is to report that a 'cannot parse JSON' error is returned for the Features functionality in ICEES DILI, I'm not entirely convinced that it is a Swagger UI issue because it is only affecting the ICEES DILI dev instance (not the ICEES Asthma dev or ICEES PCD dev instances). See #223.

karafecho commented 2 years ago

Update: The JSON parse error occurs when the same query is run on the ICEES DILI dev instance on ebcr0 (https://icees.renci.org:16341/apidocs#) and the ICEES DILI prod instance on Sterling (https://icees-dili.renci.org/apidocs).

hyi commented 2 years ago

I did notice this when using DILI data to debug ICEES locally and I actually spent some time debugging into the code and added Json parser code to parse the generated JSON output and it succeeded without any problem, yet Swagger UI complains it cannot parse the same JSON output. So I concluded it is the Swagger UI issue. I also noticed Swagger UI can parse some JSON output from some query, but cannot from some other query. The fact that the same json output can be parsed using JSON library makes me think it is a Swagger UI issue.

kennethmorton commented 2 years ago

I independently stumbled into this. I don't think it's a swagger issue. We seem to be returning invalid JSON. This happens when some calculations return NaN and theses are not properly serialized to null. Here is a query that will return invalid JSON.

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicates": [
            "biolink:correlated_with"
          ]
        }
      },
      "nodes": {
        "n00": {
          "ids": [
            "PUBCHEM.COMPOUND:132508376"
          ]
        },
        "n01": {
          "categories": [
            "biolink:NamedThing"
          ]
        }
      }
    },
    "knowledge_graph": {
      "nodes": {},
      "edges": {}
    },
    "results": []
  }
}

The interesting/bad/NaN part is returned for a particular edge. In the below I have quoted the NaNs so that they parse and look nice on GitHub, but in the returned JSON they are not quoted.

{
  "MESH:C070076_UNII:ME6U10SD7D_3f5adc9e": {
    "predicate": "biolink:correlated_with",
    "subject": "MESH:C070076",
    "object": "UNII:ME6U10SD7D",
    "attributes": [
      {
        "attribute_type_id": "contigency:matrices",
        "value": [
          {
            "feature_a": {
              "feature_name": "TLR4_AGE",
              "feature_qualifiers": [
                { "operator": "=", "value": "0-2" },
                { "operator": "=", "value": "3-17" },
                { "operator": "=", "value": "18-34" },
                { "operator": "=", "value": "35-50" },
                { "operator": "=", "value": "51-69" },
                { "operator": "=", "value": "70-89" }
              ],
              "year": null,
              "biolink_class": "biolink:PhenotypicFeature"
            },
            "feature_b": {
              "feature_name": "Avg24hAcetaldehydeExposure_2",
              "feature_qualifiers": [
                { "operator": "=", "value": 1 },
                { "operator": "=", "value": 2 },
                { "operator": "=", "value": 3 },
                { "operator": "=", "value": 4 },
                { "operator": "=", "value": 5 }
              ],
              "year": null,
              "biolink_class": "biolink:SmallMolecule"
            },
            "feature_matrix": [],
            "rows": [
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" }
            ],
            "columns": [
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" },
              { "frequency": 0, "percentage": "NaN" }
            ],
            "total": 0,
            "p_value": 1.0,
            "chi_squared": 0.0
          }
        ]
      }
    ]
  }
}
hyi commented 2 years ago

@kennethmorton Thanks for all the specific details. I must have looked into a wrong query result or something when I debugged this, since I printed out the query results and put them into a JSON parser and it parsed it without issues. I will look into this further trying to resolve it.

karafecho commented 2 years ago

FWIW, I think Kenny's explanation might explain why the error surfaced only when testing ICEES DILI, not ICEES Asthma or ICEES PCD.

kennethmorton commented 2 years ago

I failed to mention above, but that query is for the Asthma instance.

karafecho commented 2 years ago

I noticed, but I still think your explanation may explain the error I received with the queries I was testing.

karafecho commented 2 years ago

Per Kenny: We may want to try a more rigorous JSON serializer, perhaps orjson.

karafecho commented 2 years ago

Hong is implementing a fix to change the formatting of the ICEES tables such that queries will return valid JSON even for NaN/null values.

karafecho commented 2 years ago

Closing issue after extensively testing non-KG and KG endpoints for ICEES Asthma, DILI, and PCD instances and identifying no issues after Hong's fix and redeployment.