ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

"Handle Bins" functionality does not work properly #231

Closed karafecho closed 1 year ago

karafecho commented 2 years ago

This issue is to report that the "Handle Bins" functionality does not work properly, at least not for the ICEES PCD prod instance.

Specifically, an "internal server error" is returned when asking for text/tabular output, but the functionality works properly for JSON output.

image

image

The "Handle Bins" functionality could be improved more generally, but that will require discussion.

karafecho commented 2 years ago

In addition to the above issues, Bin values are listed as "null" for certain variables, e.g., EstMedianHouseholdIncome, EstHouseholdNoHealthInsurance, RoadwayDistance2.

karafecho commented 2 years ago

This is a universal, meaning cross-ICEES instances, issue. Moreover, upon further testing, I'm not convinced that the bins that are returned to users are correct. The bins should vary by year and by ICEES cohort/instance, but I don't think that is the case. Is the API pointing to a "static" file?

hyi commented 2 years ago

@karafecho yes, it points to a static file I found somewhere which is the only bins file I found. I don't know how bins json file are generated, though. If we can figure out how bins are computed for each instance, I can generate them.

karafecho commented 2 years ago

@hyi : So, ICEES feature variables are all binned or recoded per regulatory/institutional mandate. In some cases, the bins are generated by me after review of the literature, discussion with SMEs, and consideration of various modeling approaches before selecting an appopriate model (sometimes more than one) for each variable. In other cases, mainly for variables that I was uncertain what the best modeling approach would be, FHIR PIT bins as part of the integration pipeline, typically using pandas.cut or pandas.qcut. The "bins" file is then generated as part of the output. So, for every FHIR PIT run, there should be an associated bins file.

Note that it is possible that Hao or Patrick modified this process, but the description above is how things were intended to work and reflects my understanding.

Hope this makes sense. Happy to discuss, if that would be helpful.

karafecho commented 2 years ago

Okay, I checked the reported bins against the patient-level calculations. I think the patient-level calculations are correct, but the reported bins for those calculations are not, but only for the ICEES COVID and PCD instances and only for certain variables in which the bins were created using pandas.cut or pandas.qcut. The reported bins for the ICEES Asthma instance are correct and were compiled into a static "handle bins" file.

I think we can fix this issue for the ICEES COVID and PCD instances, but the fix is probably easier to discuss in person. As such, I've added this to the agenda for our next ICEES+ architecture/design/operations WG meeting.

karafecho commented 1 year ago

Closing ticket, as issue has been resolved.

hyi commented 1 year ago

Closing the ticket since as commented above the ticket can be closed.