Closed vaakesan-SMH closed 9 months ago
Note that although I've added a utility function (return_hospital_field()
) to find the hospital field name, this is a hard check on the admdad
table.
Ideally we'd use find_db_tablename()
to find the name of the admdad
table (some HPC datacuts use a different name for this table). However, I tried this implementation and tested it in report_db_v3
, but there were multiple matches (admdad
and admdad_temp3333
), which caused an error.
Since this function will be primarily used with report_db
, I've fixed the table to admdad
. However, I wonder if we could make a change to find_db_tablename()
so we can take a particular table if multiple are returned (haven't thought about how this would actually work)? For now this solution will be fine because HPC users will not necessarily need loop_mlaps()
. Let me know your thoughts @gemini-wenb @loffleraSMH
Good point. The current find_db_tablename
doesn't handle multiple matches when the matches are unique from each other.
@loffleraSMH do you know whether /how often that HPC users would have multiple datasets with names starting on the same string (i.e. admdad
and admdad_temp3333
)?
I wonder if multiple matches only happens when we search on the entire non-HPC dbs (e.g. report_db_v3
). In this case, we can add a flag to find_db_tablename()
for internal use to always return admdad
.
This wouldn't work if HPC users can have multiple admdads because we won't be able to hard code which one to return without knowing the specifics diff in their naming variations.
I checked with Kieran and he confirmed that the only alternative for admdad
on HPC is admdad_subset
, so I think the best option is to hard-code those 2 options in find_db_tablename
(similarly to what we do for the lab
and transfusion
tables). I just pushed a commit with the change I'm suggesting. Does this solve the issue?
I checked with Kieran and he confirmed that the only alternative for
admdad
on HPC isadmdad_subset
, so I think the best option is to hard-code those 2 options infind_db_tablename
(similarly to what we do for thelab
andtransfusion
tables). I just pushed a commit with the change I'm suggesting. Does this solve the issue?
Yup that resolves it. Pushed a commit to use find_db_tablename()
now.
Thanks @loffleraSMH for the review. Pushed a commit with all the suggestions implemented. Please have a look and let me know if anything was missed (and re-test the function if possible).
I saw your note about "Handling of suppressWarnings for min/max(c()) Inf/-Inf should ideally also be improved" in the pull request. Let me know if you want me to help out with/review any additional changes you might make to this. More generally, the function currently returns a lot of warnings. Could it be useful to suppress all the
NAs introduced by coercion
warnings to avoid too much clutter in the output?
Missed this comment in my commit. I think that all coercion to NA
warnings remaining should be coming from laps_assign_test()
, which more generally needs a bit of a cleanup/refactor - but I am leaning towards moving that piece to a separate issue. @gemini-wenb thoughts?
I saw your note about "Handling of suppressWarnings for min/max(c()) Inf/-Inf should ideally also be improved" in the pull request. Let me know if you want me to help out with/review any additional changes you might make to this. More generally, the function currently returns a lot of warnings. Could it be useful to suppress all the
NAs introduced by coercion
warnings to avoid too much clutter in the output?Missed this comment in my commit. I think that all
coercion to NA
warnings remaining should be coming fromlaps_assign_test()
, which more generally needs a bit of a cleanup/refactor - but I am leaning towards moving that piece to a separate issue. @gemini-wenb thoughts?
Yes, the handling of supresswarning for min/max in the original comment has been handled. Remaining warnings is not due to min/max. I also think we should move laps_assign_test()
cleanup as a separate issue.
I saw your note about "Handling of suppressWarnings for min/max(c()) Inf/-Inf should ideally also be improved" in the pull request. Let me know if you want me to help out with/review any additional changes you might make to this. More generally, the function currently returns a lot of warnings. Could it be useful to suppress all the
NAs introduced by coercion
warnings to avoid too much clutter in the output?Missed this comment in my commit. I think that all
coercion to NA
warnings remaining should be coming fromlaps_assign_test()
, which more generally needs a bit of a cleanup/refactor - but I am leaning towards moving that piece to a separate issue. @gemini-wenb thoughts?Yes, the handling of supresswarning for min/max in the original comment has been handled. Remaining warnings is not due to min/max. I also think we should move
laps_assign_test()
cleanup as a separate issue.
Agree this can be moved to a separate (low-priority) issue. I'll take another quick look at the changes you made and will let you know once it's ready to be merged.
Agree this can be moved to a separate (low-priority) issue. I'll take another quick look at the changes you made and will let you know once it's ready to be merged.
Created a separate issue #55
Thanks @loffleraSMH for the review. Pushed a commit with all the suggestions implemented. Please have a look and let me know if anything was missed (and re-test the function if possible).
Thanks @vaakesan-SMH for making all those changes. Looks great! I just made a small fix to return_hospital_field()
(see above). The only other thing I noticed is that the function takes forever to run on drm_cleandb_v2_1_1
(likely due to FDW, it's fine on drm_cleandb_v1_2_2
), but we can leave it as is for now as that shouldn't be an issue in the future.
I approved the pull request so feel free to go ahead and merge now.
Thanks @vaakesan-SMH for making all those changes. Looks great! I just made a small fix to
return_hospital_field()
(see above). The only other thing I noticed is that the function takes forever to run ondrm_cleandb_v2_1_1
(likely due to FDW, it's fine ondrm_cleandb_v1_2_2
), but we can leave it as is for now as that shouldn't be an issue in the future.I approved the pull request so feel free to go ahead and merge now.
Right that's a good point. We actually had this discussion with Trong. mlaps
will still be calculated by the DB team and available in derived_variables
, and will generally be run on report_db
before creating a drm_cleandb
. For HPC
datacuts I believe the FDW isn't an issue, is that correct?
The only issue would be if someone wants to run a custom configuration of mlaps
(component_wise
or hours_after_admission
) on their cohort, but really in that case they could just use the mlaps
function directly instead of the loop_mlaps
wrapper. The wrapper was basically designed for the DB team to run the function on the entirety of report_db
.
Thanks @vaakesan-SMH for making all those changes. Looks great! I just made a small fix to
return_hospital_field()
(see above). The only other thing I noticed is that the function takes forever to run ondrm_cleandb_v2_1_1
(likely due to FDW, it's fine ondrm_cleandb_v1_2_2
), but we can leave it as is for now as that shouldn't be an issue in the future. I approved the pull request so feel free to go ahead and merge now.Right that's a good point. We actually had this discussion with Trong.
mlaps
will still be calculated by the DB team and available inderived_variables
, and will generally be run onreport_db
before creating adrm_cleandb
. ForHPC
datacuts I believe the FDW isn't an issue, is that correct?The only issue would be if someone wants to run a custom configuration of
mlaps
(component_wise
orhours_after_admission
) on their cohort, but really in that case they could just use themlaps
function directly instead of theloop_mlaps
wrapper. The wrapper was basically designed for the DB team to run the function on the entirety ofreport_db
.
I'm honestly not sure about HPC (I think the datacuts use FDWs too?), but yes, great point about running mlaps
function directly without wrapper when re-deriving mLAPS for smaller cohorts. So yes, I think that's good for now!
Needs to be checked against the code review checklist.
Handling of
suppressWarnings
formin/max(c())
Inf/-Inf
should ideally also be improved.