Closed karafecho closed 8 months ago
Per discussion with Hong:
Leave Discover Cohort endpoint as is; in other words, allow users to define cohorts per their choice.
Fix >9 issues with Features and Association to All Features endpoints; issue involves db and YAML file.
Update:
Turns out that this is a very complicated issue. In brief, AgeStudyStart
and AgeStudyStart2
are binned in FHIR PIT as part of preprocAge() routine. For example, for AgeStudyStart2
, the bin labels are enumerated as '<5', '5-17', '18-44', '45-64', '65-89', which are strings stored in the ICEES+ database ingested from patient data generated from FHIR PIT. Currently, if an operator is not specified as part of user input, then all feature variables are compared with an equal operator for feature associations and multilevel analysis endpoints. In fact, the values or levels to compare with the equal operator for each feature variable are specified in the all_features YAML files, either through enumeration or minimum/maximum specifications. As a result, AgeStudyStart2
, which includes a bin of <5
, is stored in the database as a string and compared with <5
level string specified in the all_features YAML file with the ==
equal operator. On the other hand, TotalEDInpatientVisits
, TotalEDVisits
, and TotalInpatientVisits
are enumerated as integers and treated as ==
in the same way as other feature variables. We tried to include >9
in the all_features YAML file for visit variables and also tried to parse it to apply >
operator to aggregate visit variable integers into >9
bins for feature association and multivariate endpoints, but that raises the conflict with the string comparison with <5
for age variables, which should be compared with the equal operator instead. This complexity made us examine the current design and implementation for binning between FHIR -PIT, all_features YAML specification, and ICEES+ applications. If we want to expand the current design and implementation to allow >
and <
operators, in addition to ==
operator, then how can we specify this in all_features YAML file to inform the different operator comparisons? Currently, in the all_features YAML files, if we change the enum
to min
/max
(range
) and set it to min=0
and max = 100
, then 0 through 100 will be returned, regardless of whether the values are null
. Is this desired behavior? How can we improve it to make the specification more consistent across different feature variables to address different use cases?
Adding on to this ticket ... For the Features endpoint (but not the other endpoints), the output for the visit variables is binned as 0 ... 9 not 0 ... 9, >9 in the pre 2014 tables. A quick review suggests that the issue is specific to the visit variables. Any ideas why?
2014
+----------------------------+---------+
| feature | count |
+============================+=========+
| TotalEDInpatientVisits = 0 | 17679 |
| | 64.24% |
+----------------------------+---------+
| TotalEDInpatientVisits = 1 | 4355 |
| | 15.82% |
+----------------------------+---------+
| TotalEDInpatientVisits = 2 | 2030 |
| | 7.38% |
+----------------------------+---------+
| TotalEDInpatientVisits = 3 | 1084 |
| | 3.94% |
+----------------------------+---------+
| TotalEDInpatientVisits = 4 | 651 |
| | 2.37% |
+----------------------------+---------+
| TotalEDInpatientVisits = 5 | 540 |
| | 1.96% |
+----------------------------+---------+
| TotalEDInpatientVisits = 6 | 250 |
| | 0.91% |
+----------------------------+---------+
| TotalEDInpatientVisits = 7 | 223 |
| | 0.81% |
+----------------------------+---------+
| TotalEDInpatientVisits = 8 | 146 |
| | 0.53% |
+----------------------------+---------+
| TotalEDInpatientVisits = 9 | 96 |
| | 0.35% |
+----------------------------+---------+
| TotalEDInpatientVisits > 9 | 467 |
| | 1.70% |
+----------------------------+---------+
2013
+----------------------------+---------+
| feature | count |
+============================+=========+
| TotalEDInpatientVisits = 0 | 16488 |
| | 100.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 1 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 2 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 3 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 4 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 5 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 6 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 7 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 8 | 0 |
| | 0.00% |
+----------------------------+---------+
| TotalEDInpatientVisits = 9 | 0 |
| | 0.00% |
+----------------------------+---------+
2014
+----------------------------+-----------------+------------------+---------+
| feature | Sex2 = Female | Sex2 <> Female | |
+============================+=================+==================+=========+
| TotalEDInpatientVisits = 0 | 1255 57.46% | 929 42.54% | 2184 |
| | 64.79% 37.89% | 67.56% 28.05% | 65.94% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 1 | 310 60.19% | 205 39.81% | 515 |
| | 16.00% 9.36% | 14.91% 6.19% | 15.55% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 2 | 149 64.50% | 82 35.50% | 231 |
| | 7.69% 4.50% | 5.96% 2.48% | 6.97% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 3 | 69 55.65% | 55 44.35% | 124 |
| | 3.56% 2.08% | 4.00% 1.66% | 3.74% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 4 | 42 56.00% | 33 44.00% | 75 |
| | 2.17% 1.27% | 2.40% 1.00% | 2.26% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 5 | 31 53.45% | 27 46.55% | 58 |
| | 1.60% 0.94% | 1.96% 0.82% | 1.75% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 6 | 21 77.78% | 6 22.22% | 27 |
| | 1.08% 0.63% | 0.44% 0.18% | 0.82% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 7 | 11 45.83% | 13 54.17% | 24 |
| | 0.57% 0.33% | 0.95% 0.39% | 0.72% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 8 | 13 81.25% | 3 18.75% | 16 |
| | 0.67% 0.39% | 0.22% 0.09% | 0.48% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 9 | 6 54.55% | 5 45.45% | 11 |
| | 0.31% 0.18% | 0.36% 0.15% | 0.33% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits > 9 | 30 63.83% | 17 36.17% | 47 |
| | 1.55% 0.91% | 1.24% 0.51% | 1.42% |
+----------------------------+-----------------+------------------+---------+
| | 1937 | 1375 | 3312 |
| | 58.48% | 41.52% | 100.00% |
+----------------------------+-----------------+------------------+---------
2013
+----------------------------+-----------------+------------------+---------+
| feature | Sex2 = Female | Sex2 <> Female | |
+============================+=================+==================+=========+
| TotalEDInpatientVisits = 0 | 1033 58.49% | 733 41.51% | 1766 |
| | 100.00% 58.49% | 100.00% 41.51% | 100.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 1 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 2 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 3 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 4 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 5 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 6 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 7 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 8 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits = 9 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| TotalEDInpatientVisits > 9 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------------+-----------------+------------------+---------+
| | 1033 | 733 | 1766 |
| | 58.49% | 41.51% | 100.00% |
+----------------------------+-----------------+------------------+---------+
How come the features endpoint is not returning the >9 bin with a total of zero, but the 1 x N endpoint does? I only see this in the pre-2014 tables (see examples above). This is by no means a high-priority issue, but I feel like this is an inconsistency that we should probably resolve, as I'm a bit worried that it might be indicative of something bigger.
This issue should be fixed which can be tested on the ICEES PCD dev instance.
Closing issue, but first noting that Hong and I agreed to leave the Discover Cohort as is, in order to provide users with flexibility ...
This issue is to report that the variables TotalEDInpatientVisits, TotalEDVisits, and TotalInpatientVisits are not treating >9 correctly at the Discover Cohort, Features, and Association to All Features endpoint.
Discover Cohort - note that one can correctly create a cohort in which TotalEDInpatientVisits > 9, but for logical consistency, we should not allow users to create cohorts defined as, e.g., TotalEDInpatientVisits > 10.
Features
Association to All Features