etiennebacher / tidypolars

Get the power of polars with the syntax of the tidyverse
https://tidypolars.etiennebacher.com
Other
141 stars 3 forks source link

`dplyr::collect()` should return data.frame, not RPolarsDataFrame #98

Closed eitsupi closed 3 months ago

eitsupi commented 3 months ago

Please check the behavior of the other dplyr backends.

https://dplyr.tidyverse.org/reference/compute.html

So here collect() should return data.frame and compute() should return RPolarsDataFrame.

You can also set the number of rows to return with an optional argument like dbplyr's collect's n, so there is no need to maintain your own function fetch(). https://dbplyr.tidyverse.org/reference/collapse.tbl_sql.html

Also check polarssql behavior. https://rpolars.github.io/r-polarssql/reference/compute.tbl_polarssql_connection.html

etiennebacher commented 3 months ago

Thanks, I agree it should probably be changed for consistency with other R packages. I'm just thinking of how I'm gonna make this apparent in the docs. I made a small demo of polars + tidypolars at work and it was very convenient to use the term "collect" independently of whether I was talking about py-polars, r-polars or tidypolars. Using "collect" for py-polars and r-polars but "compute" for tidypolars will introduce some confusion I think

eitsupi commented 3 months ago

Using "collect" for py-polars and r-polars but "compute" for tidypolars will introduce some confusion I think

I don't know why you just issue "collect", I think things like "select", "where" and "group_by" also work differently in polars and dplyr.

etiennebacher commented 3 months ago

This will be implemented in 0.8.0. For now I implemented a warning redirecting to compute(). Small summary of behavior by version: