Closed karafecho closed 10 months ago
@karafecho I believe this issue was triggered by changes to address the issue https://github.com/ExposuresProvider/icees-api/issues/247 as well which changed the response from cohort endpoint to return number of patients rather than number of observations as previously implemented. So the counts returned here are for number of observations for selected cohort rather than aggregating across years. Should we make it clear the returned count are for number of observations? The confusion here is that the cohort returns number of patients while these features and features associations endpoints return number of observations for the select cohort, which was discussed a bit in that issue https://github.com/ExposuresProvider/icees-api/issues/247. It seems to me these endpoints are all returning correct results and we just need to update documentation to make the returned results clearer to the users. What do you think?
@hyi : Thanks for the investigative work. I think your conclusion is correct; however, I did some systematic testing at both the asthma and pcd prod endpoints, just to take a closer look, as the patient vs observation issue is a bit confusing. Plus, I wanted to consider how this might affect the multivariate functionality. I've attached two files with tabular example results. (Note that I didn't save all the CURLs.)
I noticed a few things:
Active_In_Year=1
or year=2010
, the results are identical, which does not seem right.year=2010
, but it fails for the following COHORT, due I think to the fact that this cohort was selected using the "between" operation.COHORT:21 (Active_In_Year = 1,Race_UNC <> Unknown,TotalEDInpatientVisits between (1,8))
+---------+
| error |
+=========+
| 'value' |
+---------+
Asthma
curl -X 'POST' \
'https://icees-asthma.renci.org/patient/cohort/COHORT%3A1/feature_association' \
-H 'accept: text/tabular' \
-H 'Content-Type: application/json' \
-d '{
"feature_a": {
"Sex2": {
"operator": "=",
"value": "Female"
}
},
"feature_b": {
"study_period": {
"operator": "=",
"value": "2010"
}
}
}'
_422 Error, not a valid feature - studyperiod must not be defined in YAML
PCD
COHORT:1 (all), year 2010
+----------------------+-----------------+------------------+---------+
| feature | Sex2 = Female | Sex2 <> Female | |
+======================+=================+==================+=========+
| study_period = 2010 | 766 58.43% | 545 41.57% | 1311 |
| | 100.00% 58.43% | 100.00% 41.57% | 100.00% |
+----------------------+-----------------+------------------+---------+
| study_period <> 2010 | 0 null | 0 null | 0 |
| | 0.00% 0.00% | 0.00% 0.00% | 0.00% |
+----------------------+-----------------+------------------+---------+
| | 766 | 545 | 1311 |
| | 58.43% | 41.57% | 100.00% |
+----------------------+-----------------+------------------+---------+
+-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+-----------------------------------------+
| chi_squared_statistic | chi_squared_dof | chi_squared_p | fisher_exact_odds_ratio | fisher_exact_p | log_odds_ratio | log_odds_ratio_95_confidence_interval |
+=========================+===================+=================+===========================+==================+==================+=========================================+
| 1.29888e-17 | 1 | 1 | | | | |
+-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+-----------------------------------------+
COHORT:1 (all)
curl -X 'POST' \
'https://icees-pcd.renci.org/cohort/COHORT%3A1/multivariate_feature_analysis' \
-H 'accept: text/tabular' \
-H 'Content-Type: application/json' \
-d '[
"TotalEDInpatientVisits",
"Sex2",
"study_period"
]'
_Returns terms/conditions, zero results - tested several variations - studyperiod not valid feature at multivariate endpoints?
All in all, I think you are correct that all endpoints are behaving properly and by design. However, we probably need to:
Active_In_Year
feature and review the code to figure out what this feature actually represents. (Note that we've struggled with this feature in the past.)study_period
to the asthma all_features YAML file.study_period
cannot be selected at the multivariate endpoint.PatientID
in the output table.Does the above seem correct?
One additional comment/task: we do need to update the documentation. I can do this, but please advise on your preferred approach for doing so. Thanks!
@hyi : Brenna continues to report issues with the multivariate endpoint, so I did a bit of systematic testing. Perhaps we can discuss?
curl -X 'POST' \
'https://icees-asthma.renci.org/patient/cohort/COHORT%3A1/feature_association' \
-H 'accept: text/tabular' \
-H 'Content-Type: application/json' \
-d '{
"feature_a": {
"Sex2": {
"operator": "=",
"value": "Female"
}
},
"feature_b": {
"PrednisoneRx": {
"operator": "=",
"value": "0"
}
}
}'
+-------------------+-----------------+------------------+---------+
| feature | Sex2 = Female | Sex2 <> Female | |
+===================+=================+==================+=========+
| PrednisoneRx = 0 | 850005 56.56% | 652924 43.44% | 1502929 |
| | 94.65% 53.87% | 96.02% 41.38% | 95.24% |
+-------------------+-----------------+------------------+---------+
| PrednisoneRx <> 0 | 48029 63.96% | 27061 36.04% | 75090 |
| | 5.35% 3.04% | 3.98% 1.71% | 4.76% |
+-------------------+-----------------+------------------+---------+
| | 898034 | 679985 | 1578019 |
| | 56.91% | 43.09% | 100.00% |
+-------------------+-----------------+------------------+---------+
+-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+--------------------------------------------------------------------+
| chi_squared_statistic | chi_squared_dof | chi_squared_p | fisher_exact_odds_ratio | fisher_exact_p | log_odds_ratio | log_odds_ratio_95_confidence_interval |
+=========================+===================+=================+===========================+==================+==================+====================================================================+
| 1599.31 | 1 | 0 | 0.733498 | 0 | -0.30993 | ConfidenceInterval(low=0.7224026274952368, high=0.744764508785044) |
+-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+--------------------------------------------------------------------+
curl -X 'POST' \
'https://icees-asthma.renci.org/patient/cohort/COHORT%3A1/feature_association?year=2010' \
-H 'accept: text/tabular' \
-H 'Content-Type: application/json' \
-d '{
"feature_a": {
"Sex2": {
"operator": "=",
"value": "Female"
}
},
"feature_b": {
"PrednisoneRx": {
"operator": "=",
"value": "0"
}
}
}'
+-------------------+-----------------+------------------+---------+
| feature | Sex2 = Female | Sex2 <> Female | |
+===================+=================+==================+=========+
| PrednisoneRx = 0 | 89379 56.89% | 67740 43.11% | 157119 |
| | 98.45% 56.10% | 98.83% 42.52% | 98.61% |
+-------------------+-----------------+------------------+---------+
| PrednisoneRx <> 0 | 1404 63.59% | 804 36.41% | 2208 |
| | 1.55% 0.88% | 1.17% 0.50% | 1.39% |
+-------------------+-----------------+------------------+---------+
| | 90783 | 68544 | 159327 |
| | 56.98% | 43.02% | 100.00% |
+-------------------+-----------------+------------------+---------+
+-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+---------------------------------------------------------------------+
| chi_squared_statistic | chi_squared_dof | chi_squared_p | fisher_exact_odds_ratio | fisher_exact_p | log_odds_ratio | log_odds_ratio_95_confidence_interval |
+=========================+===================+=================+===========================+==================+==================+=====================================================================+
| 39.8835 | 1 | 2.69571e-10 | 0.755578 | 2.15103e-10 | -0.280272 | ConfidenceInterval(low=0.6924432813217383, high=0.8244688925382757) |
+-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+---------------------------------------------------------------------+
curl -X 'POST' \ 'https://icees-asthma.renci.org/cohort/COHORT%3A1/multivariate_feature_analysis' \ -H 'accept: text/tabular' \ -H 'Content-Type: application/json' \ -d '[ "TotalEDInpatientVisits", "Sex2", "Race_UNC", "PrednisoneRx" ]'
+------------------------------------+----------+--------------------------+----------------+-------------+ | Race_UNC | Sex2 | TotalEDInpatientVisits | PrednisoneRx | frequency | +====================================+==========+==========================+================+=============+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 0 | 251 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 1 | 3 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 0 | 324352 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 1 | 4206 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 0 | 109913 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 1 | 1309 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 0 | 6529 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 1 | 73 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 0 | 124409 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 1 | 11772 | +------------------------------------+----------+--------------------------+----------------+-------------+
4. Asthma multivariate with year=2010, COHORT:1 (excerpt of response shown) [year appears to have been applied correctly]
+------------------------------------+----------+--------------------------+----------------+-------------+ | Race_UNC | Sex2 | TotalEDInpatientVisits | PrednisoneRx | frequency | +====================================+==========+==========================+================+=============+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 0 | 25 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 1 | 0 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 0 | 31507 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 1 | 32 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 0 | 10311 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 1 | 19 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 0 | 617 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 1 | 0 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 0 | 13903 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 1 | 18 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = American/Alaskan Native | = Male | = 0.0 | = 0 | 317 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = American/Alaskan Native | = Male | = 0.0 | = 1 | 1 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Other | = Male | = 0.0 | = 0 | 0 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Other | = Male | = 0.0 | = 1 | 0 |
5. Asthma 2x2, no year specified, COHORT:5 (age=0-2) [response doesn't seem correct, all female?; contradicts response in (6)]
curl -X 'POST' \ 'https://icees-asthma.renci.org/patient/cohort/COHORT%3A5/feature_association' \ -H 'accept: text/tabular' \ -H 'Content-Type: application/json' \ -d '{ "feature_a": { "Sex2": { "operator": "=", "value": "Female" } }, "feature_b": { "PrednisoneRx": { "operator": "=", "value": "0" } } }'
+-------------------+-----------------+------------------+---------+ | feature | Sex2 = Female | Sex2 <> Female | | +===================+=================+==================+=========+ | PrednisoneRx = 0 | 9754 100.00% | 0 0.00% | 9754 | | | 88.46% 88.46% | null 0.00% | 88.46% | +-------------------+-----------------+------------------+---------+ | PrednisoneRx <> 0 | 1273 100.00% | 0 0.00% | 1273 | | | 11.54% 11.54% | null 0.00% | 11.54% | +-------------------+-----------------+------------------+---------+ | | 11027 | 0 | 11027 | | | 100.00% | 0.00% | 100.00% | +-------------------+-----------------+------------------+---------+ +-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+-----------------------------------------+ | chi_squared_statistic | chi_squared_dof | chi_squared_p | fisher_exact_odds_ratio | fisher_exact_p | log_odds_ratio | log_odds_ratio_95_confidence_interval | +=========================+===================+=================+===========================+==================+==================+=========================================+ | 6.43122e-16 | 1 | 1 | | | | | +-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+-----------------------------------------+
6. Asthma 2x2, year=2010, COHORT:5 (age=0-2)
curl -X 'POST' \ 'https://icees-asthma.renci.org/patient/cohort/COHORT%3A5/feature_association?year=2010' \ -H 'accept: text/tabular' \ -H 'Content-Type: application/json' \ -d '{ "feature_a": { "Sex2": { "operator": "=", "value": "Female" } }, "feature_b": { "PrednisoneRx": { "operator": "=", "value": "0" } } }'
+-------------------+-----------------+------------------+---------+ | feature | Sex2 = Female | Sex2 <> Female | | +===================+=================+==================+=========+ | PrednisoneRx = 0 | 5316 40.09% | 7944 59.91% | 13260 | | | 100.00% 40.08% | 99.94% 59.89% | 99.96% | +-------------------+-----------------+------------------+---------+ | PrednisoneRx <> 0 | 0 0.00% | 5 100.00% | 5 | | | 0.00% 0.00% | 0.06% 0.04% | 0.04% | +-------------------+-----------------+------------------+---------+ | | 5316 | 7949 | 13265 | | | 40.08% | 59.92% | 100.00% | +-------------------+-----------------+------------------+---------+ +-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+-----------------------------------------+ | chi_squared_statistic | chi_squared_dof | chi_squared_p | fisher_exact_odds_ratio | fisher_exact_p | log_odds_ratio | log_odds_ratio_95_confidence_interval | +=========================+===================+=================+===========================+==================+==================+=========================================+ | 3.34508 | 1 | 0.0674063 | | | | | +-------------------------+-------------------+-----------------+---------------------------+------------------+------------------+-----------------------------------------+
7. Asthma multivariate, no year specified, COHORT:5 (age=0-2) (excerpt of results shown) [same results as in (3)]
curl -X 'POST' \ 'https://icees-asthma.renci.org/cohort/COHORT%3A5/multivariate_feature_analysis' \ -H 'accept: text/tabular' \ -H 'Content-Type: application/json' \ -d '[ "TotalEDInpatientVisits", "Sex2", "Race_UNC", "PrednisoneRx" ]'
+------------------------------------+----------+--------------------------+----------------+-------------+ | Race_UNC | Sex2 | TotalEDInpatientVisits | PrednisoneRx | frequency | +====================================+==========+==========================+================+=============+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 0 | 251 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 1 | 3 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 0 | 324352 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 1 | 4206 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 0 | 109913 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 1 | 1309 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 0 | 6529 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 1 | 73 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 0 | 124409 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 1 | 11772 | +------------------------------------+----------+--------------------------+----------------+-------------+
8. Asthma multivariate, year=2010, COHORT:5 (age=0-2) [year appears to have been applied correctly, but same results as in (4)]
curl -X 'POST' \ 'https://icees-asthma.renci.org/cohort/COHORT%3A5/multivariate_feature_analysis?year=2010' \ -H 'accept: text/tabular' \ -H 'Content-Type: application/json' \ -d '[ "TotalEDInpatientVisits", "Sex2", "Race_UNC", "PrednisoneRx" ]'
+------------------------------------+----------+--------------------------+----------------+-------------+ | Race_UNC | Sex2 | TotalEDInpatientVisits | PrednisoneRx | frequency | +====================================+==========+==========================+================+=============+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 0 | 25 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Native Hawaiian/Pacific Islander | = Male | = 0.0 | = 1 | 0 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 0 | 31507 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Caucasian | = Male | = 0.0 | = 1 | 32 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 0 | 10311 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = African American | = Male | = 0.0 | = 1 | 19 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 0 | 617 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Asian | = Male | = 0.0 | = 1 | 0 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 0 | 13903 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Unknown | = Male | = 0.0 | = 1 | 18 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = American/Alaskan Native | = Male | = 0.0 | = 0 | 317 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = American/Alaskan Native | = Male | = 0.0 | = 1 | 1 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Other | = Male | = 0.0 | = 0 | 0 | +------------------------------------+----------+--------------------------+----------------+-------------+ | = Other | = Male | = 0.0 | = 1 | 0 |
BTW, and this may be important information, all of the cohorts were preserved, despite the fact that you recreated the database. Perhaps something went wrong with that?
@karafecho database is only recreated for PCD instances due to changes to all features yaml file. For other instances, only valuesets yaml file is updated with correct min and max ranges for year, so there are no database changes.
Closing issue, as this issue was resolved ...
Brenna (intern) is exploring the multivariate technique. She has successfully used the lesson plan that I created to independently run all available ICEES+ functionalities and generate a multivariate table. However, she uncovered a bug that I reproduced and explored a bit. The issue appears to be related to the change we made in the treatment of integers vs floats and/or a failure to recognize the cohort input parameter at certain endpoints. Here's an example:
Discover cohort function appears to run fine
+-------------+--------+ | cohort_id | size | +=============+========+ | COHORT:26 | 15901 |
2x2 association appears to run fine
Excerpt:
Excerpt: