ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Issue with Race variable when used to create cohort and then applied at multivariate endpoint #308

Closed karafecho closed 4 months ago

karafecho commented 5 months ago

This issue is to report an apparent bug that Brenna identified when developing multivariate queries. She's retested after our last two bug fixes and redeployments, but the issue was still apparent. She's also recreated the issue when running command-line CURLs. I'm not sure what's going on, but I think it appears to be related to the "in" operator. So, perhaps the root cause of this issue is the same as issue #305.

Cohort 2 (restrict Race, COHORT:14 in curl requests)

Cohort definition: image

Table generation: image

Adding additional variables and/or using CURL requests yields the same result.

Cohort 4 (restrict Race and TotalEDInpatientVisits, COHORT:15 in curl requests) Cohort definition: image image

Table generation: image image

karafecho commented 5 months ago

More strange results from Brenna:

05H_Summary_03.20.2024.pdf

I'm not sure what's going on, but I'm not able to replicate Brenna's results, at least the few I tested. For example:

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":"2010"}}'

Same output with or without quotations around "2010".

  "return value": {
    "cohort_id": "COHORT:3",
    "size": 1311
  }
}

Another example:

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":"2010"}, "TotalEDInpatientVisits":{"operator":"<=","value":9}}'

Same output with or without quotation marks around "9".

  "return value": {
    "cohort_id": "COHORT:17",
    "size": 1311
  }
}
karafecho commented 5 months ago

Please disregard the prior post. I used the wrong year. Nonetheless, I still can't reproduce the results.

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":2020}}'
  "return value": {
    "cohort_id": "COHORT:19",
    "size": 4753
  }
}

Brenna's query yielded N=4168 patients.

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"year":{"operator":"=","value":2020}, "TotalEDInpatientVisits":{"operator":"<=","value":9}}'
  "return value": {
    "cohort_id": "COHORT:20",
    "size": 4569
  }
}

Brenna's query yielded N=4341 patients.

karafecho commented 5 months ago

FYI: Queries of the csv file from year=2020 yield sample sizes for the two cohorts above that match those obtained from queries of the API. I think there's an issue on Brenna's end, although I have not done extensive testing.

karafecho commented 5 months ago

Post-fix tests on dev:

All passed except this one:

curl -X 'POST' \ 'https://icees-pcd-dev.apps.renci.org/patient/cohort' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"Race":{"operator":"in","values":["African American","Caucasian","Asian"]}}'

"return value": "Input features invalid or cohort ≤10 patients. Please try again."

Also, I ran this query with COHORT:7 (year=2010):

curl -X 'POST' \
  'https://icees-pcd-dev.apps.renci.org/cohort/COHORT%3A7/multivariate_feature_analysis' \
  -H 'accept: text/tabular' \
  -H 'Content-Type: application/json' \
  -d '[
  "TotalEDInpatientVisits",
  "Sex2",
  "Race_UNC"
]'

It ran fine, but it didn't yield any results. Shouldn't it have returned an error, given that Race_UNC doesn't exist in the dataset?

karafecho commented 5 months ago

All bugs have been fixed in the code supporting the PCD and DILI instances, deployed to DEV, tested, deployed to PROD, and retested. Note that the asthma and COVID instances will need to be updated at some point in the future.

karafecho commented 4 months ago

Bug fixed and tested, so closing ticket ...