ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

PCD endpoint is aggregating across years instead of returning results for selected cohort #281

Closed karafecho closed 10 months ago

karafecho commented 11 months ago

This issue is to report that the PCD endpoint is aggregated across years instead of returning results for selected cohort. May be related to #280.

For example:

  1. Discover cohort
curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{}'

"return value": { "cohort_id": "COHORT:1", "size": 7940 }

  1. Features for COHORT:1

curl -X 'GET' \ 'https://icees-pcd.renci.org/patient/cohort/COHORT%3A1/features' \ -H 'accept: text/tabular'

Excerpt:


`+---------------------+---------+
| feature             | count   |
+=====================+=========+
| study_period = 2010 | 12680   |
|                     | 3.78%   |
+---------------------+---------+
| study_period = 2011 | 13451   |
|                     | 4.01%   |
+---------------------+---------+
| study_period = 2012 | 15733   |
|                     | 4.69%   |
+---------------------+---------+
| study_period = 2013 | 16488   |
|                     | 4.91%   |
+---------------------+---------+
| study_period = 2014 | 27521   |
|                     | 8.20%   |
+---------------------+---------+
| study_period = 2015 | 32897   |
|                     | 9.80%   |
+---------------------+---------+
| study_period = 2016 | 38441   |
|                     | 11.45%  |
+---------------------+---------+
| study_period = 2017 | 39991   |
|                     | 11.91%  |
+---------------------+---------+
| study_period = 2018 | 40105   |
|                     | 11.95%  |
+---------------------+---------+
| study_period = 2019 | 37151   |
|                     | 11.07%  |
+---------------------+---------+
| study_period = 2020 | 34304   |
|                     | 10.22%  |
+---------------------+---------+
| study_period = 2021 | 26924   |
|                     | 8.02%   |
+---------------------+---------+
+----------------------------+---------+
| feature                    | count   |
+============================+=========+
| Active_In_Study_Period = 0 | 129     |
|                            | 0.04%   |
+----------------------------+---------+
| Active_In_Study_Period = 1 | 335557  |
|                            | 99.96%  |
+----------------------------+---------+
`
hyi commented 10 months ago

@karafecho After further investigating, I realized this issue was triggered by changes to address the issue #247 which changed the response from cohort endpoint to return number of patients rather than number of observations as previously implemented. So the counts returned here are for number of observations for selected cohort rather than aggregating across years. Should we make it clear the returned count are for number of observations? The confusion here is that the cohort returns number of patients while these features and features associations endpoints return number of observations for the select cohort, which was discussed a bit in that issue #247. It seems to me these endpoints are all returning correct results and we just need to update documentation to make the returned results clearer to the users. What do you think?

karafecho commented 10 months ago

See my 10.11.2023 post at #280.

karafecho commented 10 months ago

Closing ticket, as this issue has been resolved ...