ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

TotalEDInpatientVisits, TotalEDVisits, TotalInpatientVisits - incorrect treatment of >9 at several endpoints #297

Closed karafecho closed 8 months ago

karafecho commented 10 months ago

This issue is to report that the variables TotalEDInpatientVisits, TotalEDVisits, and TotalInpatientVisits are not treating >9 correctly at the Discover Cohort, Features, and Association to All Features endpoint.

Discover Cohort - note that one can correctly create a cohort in which TotalEDInpatientVisits > 9, but for logical consistency, we should not allow users to create cohorts defined as, e.g., TotalEDInpatientVisits > 10.

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{"TotalEDInpatientVisits":{"operator":">","value":"10"}}'
  "return value": {
    "cohort_id": "COHORT:3",
    "size": 527
  }

Features

curl -X 'GET' \
  'https://icees-pcd.renci.org/patient/cohort/COHORT%3A1/features?year=2020' \
  -H 'accept: text/tabular'
+-----------------------------+---------+
| feature                     | count   |
+=============================+=========+
| TotalEDInpatientVisits = 0  | 222447  |
|                             | 66.27%  |
+-----------------------------+---------+
| TotalEDInpatientVisits = 1  | 40211   |
|                             | 11.98%  |
+-----------------------------+---------+
| TotalEDInpatientVisits = 2  | 21748   |
|                             | 6.48%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 3  | 13456   |
|                             | 4.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 4  | 9319    |
|                             | 2.78%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 5  | 6335    |
|                             | 1.89%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 6  | 4593    |
|                             | 1.37%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 7  | 3441    |
|                             | 1.03%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 8  | 2523    |
|                             | 0.75%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 9  | 2067    |
|                             | 0.62%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = >9 | 0       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 10 | 1793    |
|                             | 0.53%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 11 | 1309    |
|                             | 0.39%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 12 | 1144    |
|                             | 0.34%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 13 | 918     |
|                             | 0.27%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 14 | 590     |
|                             | 0.18%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 15 | 762     |
|                             | 0.23%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 16 | 482     |
|                             | 0.14%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 17 | 439     |
|                             | 0.13%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 18 | 341     |
|                             | 0.10%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 19 | 209     |
|                             | 0.06%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 20 | 230     |
|                             | 0.07%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 21 | 183     |
|                             | 0.05%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 22 | 120     |
|                             | 0.04%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 23 | 198     |
|                             | 0.06%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 24 | 153     |
|                             | 0.05%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 25 | 54      |
|                             | 0.02%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 26 | 90      |
|                             | 0.03%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 27 | 76      |
|                             | 0.02%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 28 | 38      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 29 | 28      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 30 | 63      |
|                             | 0.02%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 31 | 47      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 32 | 34      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 33 | 39      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 34 | 25      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 35 | 31      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 37 | 33      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 38 | 10      |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 39 | 6       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 41 | 22      |
|                             | 0.01%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 42 | 16      |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 44 | 12      |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 45 | 12      |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 48 | 8       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 51 | 3       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 55 | 6       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 63 | 1       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 66 | 8       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 70 | 8       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 83 | 4       |
|                             | 0.00%   |
+-----------------------------+---------+
| TotalEDInpatientVisits = 94 | 1       |
|                             | 0.00%   |
+-----------------------------+---------+

Association to All Features

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort/COHORT%3A1/associations_to_all_features' \
  -H 'accept: text/tabular' \
  -H 'Content-Type: application/json' \
  -d '{
  "feature": {
    "TotalEDInpatientVisits": {
      "operator": "=",
      "value": "0"
    }
  },
  "maximum_p_value": 1,
  "correction": {
    "method": "bonferroni"
  }
}'
+-----------------------------+------------------------------+-------------------------------+---------+
| feature                     | TotalEDInpatientVisits = 0   | TotalEDInpatientVisits <> 0   |         |
+=============================+==============================+===============================+=========+
| TotalEDInpatientVisits = 0  | 30370    100.00%             | 0      0.00%                  | 30370   |
|                             | 100.00%  66.79%              | 0.00%  0.00%                  | 66.79%  |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 1  | 0      0.00%                 | 5619    100.00%               | 5619    |
|                             | 0.00%  0.00%                 | 37.20%  12.36%                | 12.36%  |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 2  | 0      0.00%                 | 2943    100.00%               | 2943    |
|                             | 0.00%  0.00%                 | 19.48%  6.47%                 | 6.47%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 3  | 0      0.00%                 | 1791    100.00%               | 1791    |
|                             | 0.00%  0.00%                 | 11.86%  3.94%                 | 3.94%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 4  | 0      0.00%                 | 1217   100.00%                | 1217    |
|                             | 0.00%  0.00%                 | 8.06%  2.68%                  | 2.68%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 5  | 0      0.00%                 | 815    100.00%                | 815     |
|                             | 0.00%  0.00%                 | 5.40%  1.79%                  | 1.79%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 6  | 0      0.00%                 | 591    100.00%                | 591     |
|                             | 0.00%  0.00%                 | 3.91%  1.30%                  | 1.30%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 7  | 0      0.00%                 | 439    100.00%                | 439     |
|                             | 0.00%  0.00%                 | 2.91%  0.97%                  | 0.97%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 8  | 0      0.00%                 | 314    100.00%                | 314     |
|                             | 0.00%  0.00%                 | 2.08%  0.69%                  | 0.69%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = 9  | 0      0.00%                 | 261    100.00%                | 261     |
|                             | 0.00%  0.00%                 | 1.73%  0.57%                  | 0.57%   |
+-----------------------------+------------------------------+-------------------------------+---------+
| TotalEDInpatientVisits = >9 | 0      null                  | 0      null                   | 0       |
|                             | 0.00%  0.00%                 | 0.00%  0.00%                  | 0.00%   |
+-----------------------------+------------------------------+-------------------------------+---------+
|                             | 30370                        | 15104                         | 45474   |
|                             | 66.79%                       | 33.21%                        | 100.00% |
+-----------------------------+------------------------------+-------------------------------+---------+
karafecho commented 10 months ago

Per discussion with Hong:

Leave Discover Cohort endpoint as is; in other words, allow users to define cohorts per their choice.

Fix >9 issues with Features and Association to All Features endpoints; issue involves db and YAML file.

karafecho commented 9 months ago

Update:

Turns out that this is a very complicated issue. In brief, AgeStudyStart and AgeStudyStart2 are binned in FHIR PIT as part of preprocAge() routine. For example, for AgeStudyStart2, the bin labels are enumerated as '<5', '5-17', '18-44', '45-64', '65-89', which are strings stored in the ICEES+ database ingested from patient data generated from FHIR PIT. Currently, if an operator is not specified as part of user input, then all feature variables are compared with an equal operator for feature associations and multilevel analysis endpoints. In fact, the values or levels to compare with the equal operator for each feature variable are specified in the all_features YAML files, either through enumeration or minimum/maximum specifications. As a result, AgeStudyStart2, which includes a bin of <5, is stored in the database as a string and compared with <5 level string specified in the all_features YAML file with the == equal operator. On the other hand, TotalEDInpatientVisits, TotalEDVisits, and TotalInpatientVisits are enumerated as integers and treated as == in the same way as other feature variables. We tried to include >9 in the all_features YAML file for visit variables and also tried to parse it to apply > operator to aggregate visit variable integers into >9 bins for feature association and multivariate endpoints, but that raises the conflict with the string comparison with <5 for age variables, which should be compared with the equal operator instead. This complexity made us examine the current design and implementation for binning between FHIR -PIT, all_features YAML specification, and ICEES+ applications. If we want to expand the current design and implementation to allow > and < operators, in addition to == operator, then how can we specify this in all_features YAML file to inform the different operator comparisons? Currently, in the all_features YAML files, if we change the enum to min/max (range) and set it to min=0 and max = 100, then 0 through 100 will be returned, regardless of whether the values are null. Is this desired behavior? How can we improve it to make the specification more consistent across different feature variables to address different use cases?

karafecho commented 9 months ago

Adding on to this ticket ... For the Features endpoint (but not the other endpoints), the output for the visit variables is binned as 0 ... 9 not 0 ... 9, >9 in the pre 2014 tables. A quick review suggests that the issue is specific to the visit variables. Any ideas why?

2014

+----------------------------+---------+
| feature                    | count   |
+============================+=========+
| TotalEDInpatientVisits = 0 | 17679   |
|                            | 64.24%  |
+----------------------------+---------+
| TotalEDInpatientVisits = 1 | 4355    |
|                            | 15.82%  |
+----------------------------+---------+
| TotalEDInpatientVisits = 2 | 2030    |
|                            | 7.38%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 3 | 1084    |
|                            | 3.94%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 4 | 651     |
|                            | 2.37%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 5 | 540     |
|                            | 1.96%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 6 | 250     |
|                            | 0.91%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 7 | 223     |
|                            | 0.81%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 8 | 146     |
|                            | 0.53%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 9 | 96      |
|                            | 0.35%   |
+----------------------------+---------+
| TotalEDInpatientVisits > 9 | 467     |
|                            | 1.70%   |
+----------------------------+---------+

2013

+----------------------------+---------+
| feature                    | count   |
+============================+=========+
| TotalEDInpatientVisits = 0 | 16488   |
|                            | 100.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 1 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 2 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 3 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 4 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 5 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 6 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 7 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 8 | 0       |
|                            | 0.00%   |
+----------------------------+---------+
| TotalEDInpatientVisits = 9 | 0       |
|                            | 0.00%   |
+----------------------------+---------+

2014

+----------------------------+-----------------+------------------+---------+
| feature                    | Sex2 = Female   | Sex2 <> Female   |         |
+============================+=================+==================+=========+
| TotalEDInpatientVisits = 0 | 1255    57.46%  | 929     42.54%   | 2184    |
|                            | 64.79%  37.89%  | 67.56%  28.05%   | 65.94%  |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 1 | 310     60.19%  | 205     39.81%   | 515     |
|                            | 16.00%  9.36%   | 14.91%  6.19%    | 15.55%  |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 2 | 149    64.50%   | 82     35.50%    | 231     |
|                            | 7.69%  4.50%    | 5.96%  2.48%     | 6.97%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 3 | 69     55.65%   | 55     44.35%    | 124     |
|                            | 3.56%  2.08%    | 4.00%  1.66%     | 3.74%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 4 | 42     56.00%   | 33     44.00%    | 75      |
|                            | 2.17%  1.27%    | 2.40%  1.00%     | 2.26%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 5 | 31     53.45%   | 27     46.55%    | 58      |
|                            | 1.60%  0.94%    | 1.96%  0.82%     | 1.75%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 6 | 21     77.78%   | 6      22.22%    | 27      |
|                            | 1.08%  0.63%    | 0.44%  0.18%     | 0.82%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 7 | 11     45.83%   | 13     54.17%    | 24      |
|                            | 0.57%  0.33%    | 0.95%  0.39%     | 0.72%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 8 | 13     81.25%   | 3      18.75%    | 16      |
|                            | 0.67%  0.39%    | 0.22%  0.09%     | 0.48%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 9 | 6      54.55%   | 5      45.45%    | 11      |
|                            | 0.31%  0.18%    | 0.36%  0.15%     | 0.33%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits > 9 | 30     63.83%   | 17     36.17%    | 47      |
|                            | 1.55%  0.91%    | 1.24%  0.51%     | 1.42%   |
+----------------------------+-----------------+------------------+---------+
|                            | 1937            | 1375             | 3312    |
|                            | 58.48%          | 41.52%           | 100.00% |
+----------------------------+-----------------+------------------+---------

2013

+----------------------------+-----------------+------------------+---------+
| feature                    | Sex2 = Female   | Sex2 <> Female   |         |
+============================+=================+==================+=========+
| TotalEDInpatientVisits = 0 | 1033     58.49% | 733      41.51%  | 1766    |
|                            | 100.00%  58.49% | 100.00%  41.51%  | 100.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 1 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 2 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 3 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 4 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 5 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 6 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 7 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 8 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 9 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits > 9 | 0      null     | 0      null      | 0       |
|                            | 0.00%  0.00%    | 0.00%  0.00%     | 0.00%   |
+----------------------------+-----------------+------------------+---------+
|                            | 1033            | 733              | 1766    |
|                            | 58.49%          | 41.51%           | 100.00% |
+----------------------------+-----------------+------------------+---------+

How come the features endpoint is not returning the >9 bin with a total of zero, but the 1 x N endpoint does? I only see this in the pre-2014 tables (see examples above). This is by no means a high-priority issue, but I feel like this is an inconsistency that we should probably resolve, as I'm a bit worried that it might be indicative of something bigger.

hyi commented 8 months ago

This issue should be fixed which can be tested on the ICEES PCD dev instance.

karafecho commented 8 months ago

Closing issue, but first noting that Hong and I agreed to leave the Discover Cohort as is, in order to provide users with flexibility ...