GEMINI-Medicine / Rgemini

A custom R package that provides a variety of functions to perform data analyses with GEMINI data
https://gemini-medicine.github.io/Rgemini/
Other
3 stars 0 forks source link

#122 fix db check #123

Closed loffleraSMH closed 4 months ago

loffleraSMH commented 4 months ago

Closes #122

The check_input function in utils.R has been updated to prevent the error message for odbc DB connections.

wankiewiczm-smh commented 4 months ago

Hi @loffleraSMH, I tried out the function on the master branch as well as this branch (122_dbcon_check) and still experience the cannot coerce type 'S4' to vector of type 'integer' error when running the readmission function.

The output on readmission is now slightly different though, the function begins to run as it outputs messages before running into the error (see screenshot below).

image

I believe is this may be an issue with the episodes_of_care function which (on line 95) has the same check that was removed from check_input (!RPostgreSQL::isPostgresqlIdCurrent(dbcon)).

loffleraSMH commented 4 months ago

Hi @loffleraSMH, I tried out the function on the master branch as well as this branch (122_dbcon_check) and still experience the cannot coerce type 'S4' to vector of type 'integer' error when running the readmission function.

The output on readmission is now slightly different though, the function begins to run as it outputs messages before running into the error (see screenshot below).

image

I believe is this may be an issue with the episodes_of_care function which (on line 95) has the same check that was removed from check_input (!RPostgreSQL::isPostgresqlIdCurrent(dbcon)).

Thanks @wankiewiczm-smh - good catch! I replaced the check in episodes_of_care with the check_input() function (similar to how it's handled in readmission). I also added a similar fix to the icd_to_ccsr function, which was also using the isPostgresqlIdCurrent() check instead of check_input(). Let me know if this works for you now, or if I missed any other instances of this.

wankiewiczm-smh commented 4 months ago

I tried running readmission again and it looks like the issue with isPostgresqlIdCurrent is fixed, but I'm now running into an error saying Error in dbExistsTable(conn, name) : length(name) == 1 is not TRUE.

Based on what readmission has output, this is happening somewhere after line 180, but I'm not sure of the exact spot. Running find_db_tablename works fine on its own, so it doesn't appear to be an issue with that function.

loffleraSMH commented 4 months ago

I tried running readmission again and it looks like the issue with isPostgresqlIdCurrent is fixed, but I'm now running into an error saying Error in dbExistsTable(conn, name) : length(name) == 1 is not TRUE.

Based on what readmission has output, this is happening somewhere after line 180, but I'm not sure of the exact spot. Running find_db_tablename works fine on its own, so it doesn't appear to be an issue with that function.

Thanks for checking this @wankiewiczm-smh! Hm, this seems to be caused by line 198 (DBI::dbWriteTable(dbcon, c("pg_temp", "temp_readmission_data"), data[, .(genc_id)], row.names = FALSE, overwrite = TRUE)). Looks like it doesn't like the character vector input c("pg_temp", "temp_readmission_data"), but changing it to "temp_readmission_data" doesn't seem to work either. I had a quick look online and it seems like people report all kinds of issues with temp tables when using odbc connections. There's probably a fix for this but I'm starting to think it might not be worth changing this, and instead, it might be best to just keep everything as is and clarify in the documentation that database connections established with odbc are not accepted as inputs? Let me know if you agree with that. If yes, I'll revert all current changes and will instead change the error message thrown by check_input to make it more obvious that odbc connections aren't accepted.

loffleraSMH commented 4 months ago

I tried running readmission again and it looks like the issue with isPostgresqlIdCurrent is fixed, but I'm now running into an error saying Error in dbExistsTable(conn, name) : length(name) == 1 is not TRUE. Based on what readmission has output, this is happening somewhere after line 180, but I'm not sure of the exact spot. Running find_db_tablename works fine on its own, so it doesn't appear to be an issue with that function.

Thanks for checking this @wankiewiczm-smh! Hm, this seems to be caused by line 198 (DBI::dbWriteTable(dbcon, c("pg_temp", "temp_readmission_data"), data[, .(genc_id)], row.names = FALSE, overwrite = TRUE)). Looks like it doesn't like the character vector input c("pg_temp", "temp_readmission_data"), but changing it to "temp_readmission_data" doesn't seem to work either. I had a quick look online and it seems like people report all kinds of issues with temp tables when using odbc connections. There's probably a fix for this but I'm starting to think it might not be worth changing this, and instead, it might be best to just keep everything as is and clarify in the documentation that database connections established with odbc are not accepted as inputs? Let me know if you agree with that. If yes, I'll revert all current changes and will instead change the error message thrown by check_input to make it more obvious that odbc connections aren't accepted.

Oh, actually: It seems like DBI::dbWriteTable(dbcon, c("temp_readmission_data"), data[, .(genc_id)], row.names = FALSE, overwrite = TRUE) does work but it takes forever to run... And it might cause issues for other users, so yeah, I still think we might want to avoid making changes to this that might break other things.

wankiewiczm-smh commented 4 months ago

I tried running readmission again and it looks like the issue with isPostgresqlIdCurrent is fixed, but I'm now running into an error saying Error in dbExistsTable(conn, name) : length(name) == 1 is not TRUE. Based on what readmission has output, this is happening somewhere after line 180, but I'm not sure of the exact spot. Running find_db_tablename works fine on its own, so it doesn't appear to be an issue with that function.

Thanks for checking this @wankiewiczm-smh! Hm, this seems to be caused by line 198 (DBI::dbWriteTable(dbcon, c("pg_temp", "temp_readmission_data"), data[, .(genc_id)], row.names = FALSE, overwrite = TRUE)). Looks like it doesn't like the character vector input c("pg_temp", "temp_readmission_data"), but changing it to "temp_readmission_data" doesn't seem to work either. I had a quick look online and it seems like people report all kinds of issues with temp tables when using odbc connections. There's probably a fix for this but I'm starting to think it might not be worth changing this, and instead, it might be best to just keep everything as is and clarify in the documentation that database connections established with odbc are not accepted as inputs? Let me know if you agree with that. If yes, I'll revert all current changes and will instead change the error message thrown by check_input to make it more obvious that odbc connections aren't accepted.

Oh, actually: It seems like DBI::dbWriteTable(dbcon, c("temp_readmission_data"), data[, .(genc_id)], row.names = FALSE, overwrite = TRUE) does work but it takes forever to run... And it might cause issues for other users, so yeah, I still think we might want to avoid making changes to this that might break other things.

Hmm, yeah I agree with your approach of not allowing odbc connections. It's probably safer as well, it could be tricky handling both DBI and odbc in the future if it's already causing issues now.

loffleraSMH commented 4 months ago

Thanks again @wankiewiczm-smh for reviewing this, and sorry for the back and forth! Ok, so I've now changed check_input to first check if the connection is a PostgreSQL connection (if not, it will show an error message that also clarifies that odbc connections are currently not supported). If that first check passes, the isPostgresqlIdCurrent() check is run to make sure the connection is still active. This now prevents the previous "type S4" error that wasn't very informative.

readmission, episodes_of_care, and icd_to_ccsr now all run the same check implemented in check_input, so hopefully, there shouldn't be any more inconsistencies.

Let me know if the updated version looks good to you and if the error messages make sense now.

wankiewiczm-smh commented 4 months ago

Tested it on all 3 of the functions and everything looks good to me! The new error message is very helpful.

loffleraSMH commented 4 months ago

Tested it on all 3 of the functions and everything looks good to me! The new error message is very helpful.

Perfect, thanks again for reviewing this @wankiewiczm-smh! I'll go ahead and merge this.