GEMINI-Medicine / Rgemini

A custom R package that provides a variety of functions to perform data analyses with GEMINI data
https://gemini-medicine.github.io/Rgemini/
Other
3 stars 0 forks source link

15 ed exclusion flag for n imaging and n bloodwork #125

Closed guoyi-smh closed 1 month ago

guoyi-smh commented 2 months ago

closes #15

Refactored n_imaging, n_routine_bloodwork and n_rbc_transfusion. Now all functions take three arguments:

Other changes include:

loffleraSMH commented 2 months ago

@guoyi-smh one more general comment/question (for a future sprint): Some of the filtering by date-time variables happens within the SQL query now, which probably makes it more efficient, but with this approach it’s not obvious how many date-time variables are missing/invalid. I’ve been adding a new utility function that transforms/checks date-time variables (see issue #76), so I think eventually, I’d like to apply this within these 3 functions as well so users are informed about missing/invalid date-times? This might be particularly relevant for imaging, so we can inform users how missing order date-times are handled. That would mean we’d query all entries first and then perform the filtering by date-times after. Any thoughts on this approach? If you agree, I can open a new issue for that so we can tackle that in a future update.

guoyi-smh commented 2 months ago

Thanks so much for implementing this Yishan! It’s nice to have this improved consistency across the 3 functions and also have the additional warning messages.

I tested all functions to make sure they run as expected. For n_rbc_transfusions, the new output is a tiny bit different from before (when exclude_ed = TRUE) but I believe that’s simply due to the fact that the previous version excluded transfusions after discharge? For n_routine_bloodwork and n_imaging, the results are identical (if exclude_ed = FALSE) as expected.

I can do one more quick round of review once you’ve had a chance to address the comments above (no rush!).

I think you are right, the difference in transfusion should be caused by the filter on discharge_date_time. I dropped that for now to be consistent across 3 function. I had a conversation with Surain on this, and he suggested that in MPR we should have filtered by discharge_date_time for imaging too, which is currently not implemented. For the functions in Rgemini, by default, we would want to exclude records before triage_date_time and after discharge_date_time. To add these filters I think it is better we address the coverage issue first, because we will have coverage issue for er table as well as the clinical table we are looking at.

guoyi-smh commented 2 months ago

@guoyi-smh one more general comment/question (for a future sprint): Some of the filtering by date-time variables happens within the SQL query now, which probably makes it more efficient, but with this approach it’s not obvious how many date-time variables are missing/invalid. I’ve been adding a new utility function that transforms/checks date-time variables (see issue #76), so I think eventually, I’d like to apply this within these 3 functions as well so users are informed about missing/invalid date-times? This might be particularly relevant for imaging, so we can inform users how missing order date-times are handled. That would mean we’d query all entries first and then perform the filtering by date-times after. Any thoughts on this approach? If you agree, I can open a new issue for that so we can tackle that in a future update.

Agreed! I hesitated on this but chose the performance because I thought the numbers were not reported now anyways. As the user I would definitely like to see those numbers instead of just getting the final result table.

loffleraSMH commented 2 months ago

Thanks so much for implementing this Yishan! It’s nice to have this improved consistency across the 3 functions and also have the additional warning messages. I tested all functions to make sure they run as expected. For n_rbc_transfusions, the new output is a tiny bit different from before (when exclude_ed = TRUE) but I believe that’s simply due to the fact that the previous version excluded transfusions after discharge? For n_routine_bloodwork and n_imaging, the results are identical (if exclude_ed = FALSE) as expected. I can do one more quick round of review once you’ve had a chance to address the comments above (no rush!).

I think you are right, the difference in transfusion should be caused by the filter on discharge_date_time. I dropped that for now to be consistent across 3 function. I had a conversation with Surain on this, and he suggested that in MPR we should have filtered by discharge_date_time for imaging too, which is currently not implemented. For the functions in Rgemini, by default, we would want to exclude records before triage_date_time and after discharge_date_time. To add these filters I think it is better we address the coverage issue first, because we will have coverage issue for er table as well as the clinical table we are looking at.

Yes, good point! I agree, we can update the filters at a later point. I think your current implementation makes sense for now.

loffleraSMH commented 2 months ago

@guoyi-smh one more general comment/question (for a future sprint): Some of the filtering by date-time variables happens within the SQL query now, which probably makes it more efficient, but with this approach it’s not obvious how many date-time variables are missing/invalid. I’ve been adding a new utility function that transforms/checks date-time variables (see issue #76), so I think eventually, I’d like to apply this within these 3 functions as well so users are informed about missing/invalid date-times? This might be particularly relevant for imaging, so we can inform users how missing order date-times are handled. That would mean we’d query all entries first and then perform the filtering by date-times after. Any thoughts on this approach? If you agree, I can open a new issue for that so we can tackle that in a future update.

Agreed! I hesitated on this but chose the performance because I thought the numbers were not reported now anyways. As the user I would definitely like to see those numbers instead of just getting the final result table.

Ok sounds good, I'll go ahead and open a new issue for this so we can update it in the future.

guoyi-smh commented 2 months ago

Hi @loffleraSMH , Thank you so much for you thorough and thoughtful review! I have addressed the issues in the latest push. Please review the update when you have time. Thanks!!