Bioconductor / SQLDataFrame

DataFrame representation of SQL database table
GNU Lesser General Public License v3.0
2 stars 3 forks source link

Incompatible with dev dbplyr #6

Open mgirlich opened 1 year ago

mgirlich commented 1 year ago

Your package showed errors when I checked the reverse dependencies of the current dev version of dbplyr. I had a look into these issues to provide a PR but I had some difficulties to understand what exactly you try to achieve in parts of your code.

A common issue I found is that you accessed the lazy_query field of a lazy table. This belongs to the internal structure of dbplyr and breaking changes are likely to occur. There should usually be alternative from dbplyr that allow you to get what you want e.g.

It would be great if you could remove the dependency on the lazy_query field and make your package compatible with the CRAN and the dev version of dbplyr.

Liubuntu commented 1 year ago

Hi @mgirlich

Thank you very much for your insights and suggestions.

for the dbtable() function you could use dbplyr::remote_name() (or with the dev version depending on the exact use case dbplyr::remote_table()).

Thanks for pointing out these utility functions! I've updated the dbtable() function to wrap around the dbplyr::remote_name() which does the same task that I want. To avoid duplicate efforts, I've also updated my previous function of connSQLDataFrame() as dbcon() to wrap around the dbplyr::remote_con().

Liubuntu commented 1 year ago

Hi @mgirlich

for saveSQLDataFrame() I didn't understand why you need to distinguish between lazy_base_query and other situations. Can you explain me what happens there? I also don't quite understand what you achieve in .join_union_prepare_sqlite() by checking the classes of the lazy_query field.

For these two functions, I've been checking the class of "lazy_query" to decide which connection to use to attach another database to, and then use in_schema for the cross database operations (e.g., rbind, union, join). I was able to simply the code by removing this step, where the functions remain work. https://github.com/Bioconductor/SQLDataFrame/commit/7e018f2cd749673b4f92586b74e3526397e558d8

However, I still have two cases where the "lazy_query" was called, which was inside the "select" function (https://github.com/Bioconductor/SQLDataFrame/blob/9b3e3634193bb0653844a3190c3169e679977a45/R/SQLDataFrame-methods.R#L446 ). Do you have any suggested functions/ways to avoid the use here?

select.SQLDataFrame <- function(.data, ...)
{
    tbl <- .extract_tbl_from_SQLDataFrame_indexes(tblData(.data), .data)
    dots <- quos(...)
    old_vars <- op_vars(tbl$lazy_query)
    new_vars <- tidyselect::vars_select(old_vars, !!!dots, .include = op_grps(tbl$lazy_query))

Thank you! Qian

mgirlich commented 1 year ago

op_vars() and op_grps() work directly on a lazy table. But even better would be to use colnames() and dplyr::group_vars() instead.

Be a bit careful with using remote_name(). There is a breaking change between the CRAN and the dev version of dbplyr. Look at the NEWS for more information. You might night to switch behaviour depending on the package version

if (packageVersion("dbplyr") <= "2.3.2") {

} else {

}