ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Operator / enum type mishandling at cohort creation / multivariate endpoints #304

Closed karafecho closed 6 months ago

karafecho commented 6 months ago

This issue is to report a potential operator and/or enumeration type mishandling at the ICEES+ cohort creation and/or multivariate endpoints.

Specifically, the table below was generated at the ICEES+ PCD multivariate endpoint using a cohort that correctly restricted the patients to those in year=2020 and TotalEDInpatientVisits<=9. You'll see that there are two blocks where TotalEDInpatientVisits=9. I believe that the second block is actually TotalEDInpatientVisits>9, as the numbers agree with those that I see in the underlying data. Recall that our enumeration for TotalEDInpatientVisits is 0, 1, ...9, '>9'. While the COHORT was created properly (N=4569), the multivariate call is distinguishing TotalEDInpatientVisits=9 vs TotalEDInpatientVisits>9 but not excluding TotalEDInpatientVisits>9 (total frequency > 4569).

table4.1B (1).pdf

karafecho commented 6 months ago

Additional Slack exchanges with Hong revealed that this issue is not an enum issue, but rather an operator issue. Moreover, there is inconsistency in the various deployments, with prod apparently lagging behind dev.

From Hong: I have actually implemented filtering out value sets defined in all feature yaml file by the cohort definition with equal operator, but it is more convoluted for consolidating > and < operators between feature yaml definition and cohort definition. For example, if cohort definition has TotalEDInpatientVisits<5, those rows with TotalEDInpatientVisits>=5 will be filtered out and not show up in the table output. However, >9 is not excluded for <10 cohort definition since I did not consolidate the two operators.

We agreed to add logic such that >9 will be changed to >=10 and, e.g., <9 changed to to <=8. This will support table output values to be constrained by a cohort definition operator of the same variable and filter out >9 rows for a cohort definition TotalEDInpatientVisits<10.

Hong deployed the fix to prod. Kara will test.

karafecho commented 6 months ago

Update:

  1. The multivariate endpoint on PROD is not working properly. Regardless of the cohort selected, the endpoint is returning terms and conditions only. Correction: I didn't change Race_UNC to Race when testing the query. So, user error.
  2. As such, I tested the fix at the "Features" endpoint, and I'm still seeing the bug. For instance, this cohort appears to have been created properly:
    {
      "cohort_id": "COHORT:11",
      "size": 767,
      "features": {
        "TotalEDInpatientVisits": {
          "operator": ">=",
          "value": "9"
        }
      }
    },

    But the Features endpoint is returning visit data for all enumerations and for what appears to be the complete dataset.

curl -X 'GET' \
  'https://icees-pcd.renci.org/patient/cohort/COHORT%3A11/features' \
  -H 'accept: text/tabular'
+----------------------------+---------+
| feature                    | count   |
+============================+=========+
| TotalEDInpatientVisits = 0 | 3676    |
|                            | 31.65%  |
+----------------------------+---------+
| TotalEDInpatientVisits = 1 | 811     |
|                            | 6.98%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 2 | 724     |
|                            | 6.23%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 3 | 560     |
|                            | 4.82%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 4 | 517     |
|                            | 4.45%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 5 | 404     |
|                            | 3.48%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 6 | 387     |
|                            | 3.33%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 7 | 360     |
|                            | 3.10%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 8 | 307     |
|                            | 2.64%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 9 | 551     |
|                            | 4.74%   |
+----------------------------+---------+
| TotalEDInpatientVisits > 9 | 3316    |
|                            | 28.55%  |
+----------------------------+---------+
hyi commented 6 months ago

@karafecho I just tested it again and can verify multivariate endpoint now works correctly with COHORT:6 and COHORT:11. See the screen capture to see what my request query look like. image. Can you test it again? We can touch base on slack if you still see issues with the multivariate endpoint after the fix.

I can reproduce the issue with features endpoint as you reported above, but that is a different issue from my fix to this multivariate endpoint since they don't share the same code path. I'll look into that issue later, but I think it'd work better if we could separate them into two different issues since the fix to them would be different. My fix here only applies to the multivariate endpoint since the new logic I added only involves the interplay between the value sets retrieved from all feature yaml file and the cohort definition.

karafecho commented 6 months ago

I can run the multivariate query on PROD. I had tested the query you noted and several others earlier today, but the multivariate endpoint didn't seem to be working properly. Correction: I tested the wrong endpoint. So, user error.

However, I'm still seeing issues. COHORT:11 is defined as TotalEDInpatientVisits>=9 in all patients (i.e., not those in year=2020). As such, the multivariate endpoint should be returning results for TotalEDInpatientVisits>9 only. But this is what I'm seeing when restricting the query to year=2020.

curl -X 'GET' \
  'https://icees-pcd.renci.org/patient/cohort/COHORT%3A11/features?year=2020' \
  -H 'accept: text/tabular'
+----------------------------+---------+
| feature                    | count   |
+============================+=========+
| TotalEDInpatientVisits = 0 | 125     |
|                            | 11.15%  |
+----------------------------+---------+
| TotalEDInpatientVisits = 1 | 64      |
|                            | 5.71%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 2 | 64      |
|                            | 5.71%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 3 | 59      |
|                            | 5.26%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 4 | 33      |
|                            | 2.94%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 5 | 38      |
|                            | 3.39%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 6 | 43      |
|                            | 3.84%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 7 | 35      |
|                            | 3.12%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 8 | 49      |
|                            | 4.37%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 9 | 74      |
|                            | 6.60%   |
+----------------------------+---------+
| TotalEDInpatientVisits > 9 | 537     |
|                            | 47.90%  |
+----------------------------+---------+

Am I missing something?

WRT the Features functionality, I'll create a new issue.

hyi commented 6 months ago

@karafecho The curl query you pasted above is for features endpoint, not for multivariate endpoint. Just tested multivariate endpoint again with year=2020 and COHORT:11 input, and here are the results I got which appears to work as expected.

+--------------------------+-----------+------------------------------------+-------------+
| TotalEDInpatientVisits   | Sex       | Race                               |   frequency |
+==========================+===========+====================================+=============+
| = 9                      | = Male    | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Male    | = Caucasian                        |           9 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Male    | = African American                 |           3 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Male    | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Male    | = Unknown                          |           1 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Male    | = American/Alaskan Native          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Female  | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Female  | = Caucasian                        |          20 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Female  | = African American                 |           2 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Female  | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Female  | = Unknown                          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Female  | = American/Alaskan Native          |           1 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Unknown | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Unknown | = Caucasian                        |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Unknown | = African American                 |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Unknown | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Unknown | = Unknown                          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Unknown | = American/Alaskan Native          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Other   | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Other   | = Caucasian                        |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Other   | = African American                 |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Other   | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Other   | = Unknown                          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| = 9                      | = Other   | = American/Alaskan Native          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Male    | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Male    | = Caucasian                        |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Male    | = African American                 |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Male    | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Male    | = Unknown                          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Male    | = American/Alaskan Native          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Female  | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Female  | = Caucasian                        |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Female  | = African American                 |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Female  | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Female  | = Unknown                          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Female  | = American/Alaskan Native          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Unknown | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Unknown | = Caucasian                        |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Unknown | = African American                 |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Unknown | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Unknown | = Unknown                          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Unknown | = American/Alaskan Native          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Other   | = Native Hawaiian/Pacific Islander |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Other   | = Caucasian                        |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Other   | = African American                 |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Other   | = Asian                            |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Other   | = Unknown                          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
| > 9                      | = Other   | = American/Alaskan Native          |           0 |
+--------------------------+-----------+------------------------------------+-------------+
karafecho commented 6 months ago

Oh, right, duh. Sorry, I've been in back-to-back meetings since 8:30 am this morning and am a bit brain dead.

I can reproduce the results you posted above, so let's consider this issue resolved.

Closing with comment.