Open xtimbeau opened 7 months ago
Thanks @xtimbeau, can confirm this is reproducible on Ubuntu on Arrow 15.0.1, looks like it's something we might need to pull out of the DuckDB object and apply to the Arrow object, though I haven't checked to see if this is a regression or something we never implemented.
Would you be interested in submitting a PR at all?
you mean correct the bug in the code? If it involves some R I can accept the challenge… but C I prefer to pass!
Le ven. 22 mars 2024 à 14:56, Nic Crane @.***> a écrit :
Thanks @xtimbeau https://github.com/xtimbeau, can confirm this is reproducible on Ubuntu on Arrow 15.0.1, looks like it's something we might need to pull out of the DuckDB object and apply to the Arrow object, though I haven't checked to see if this is a regression or something we never implemented.
Would you be interested in submitting a PR at all?
— Reply to this email directly, view it on GitHub https://github.com/apache/arrow/issues/40640#issuecomment-2015164232, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANA2KENUG5MZX6JO73CA743YZQ2BVAVCNFSM6AAAAABE35U5YWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVGE3DIMRTGI . You are receiving this because you were mentioned.Message ID: @.***>
This seems to be the expected behavior of dbplyr
when group_by
is applied without any additional operations:
https://github.com/apache/arrow/blob/1815a679e47a47f2f5f4cd003ffbb18b6db2f952/r/R/duckdb.R#L175
library(dplyr, warn.conflicts = FALSE)
con_duck <- DBI::dbConnect(duckdb::duckdb(), ":memory:")
copy_to(con_duck, mtcars)
tbl(con_duck, "mtcars") %>%
group_by(cyl) %>%
dbplyr::remote_query()
#> <SQL> SELECT *
#> FROM mtcars
con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con_sqlite, mtcars)
tbl(con_sqlite, "mtcars") %>%
group_by(cyl) %>%
dbplyr::remote_query()
#> <SQL> SELECT *
#> FROM `mtcars`
Created on 2024-06-26 with reprex v2.1.0
I imagine it is desired behavior to not translate the group_by to SQL if there are no additional operations... however, since to_arrow()
and to_duckdb()
can be used as intermediate work operations, I should include them.... Perhaps, a solution analogous to the one developed in to_duckdb() could be proposed, but I'm not sure
https://github.com/apache/arrow/blob/68db6621c07624babec05caccbf47c079a5ed20e/r/R/duckdb.R#L65-L68
Describe the bug, including details regarding any error messages, version, and platform.
When passing a duckdb dataset to arrow with
to_arrow()
grouping information is lost.Component(s)
R