USCbiostats / partition

A fast and flexible framework for data reduction in R
https://uscbiostats.github.io/partition/
Other
35 stars 4 forks source link

Checks fail with duckplyr #43

Open krlmlr opened 1 week ago

krlmlr commented 1 week ago

The duckplyr package is aimed to be a drop-in replacement for dplyr, with full behavior compatibility. To assert that, I'm running checks with a rigged version of dplyr. This package fails its checks in this scenario.

Details: https://github.com/krlmlr/dplyr/blob/6ef6df78190c3c05f3ac63b97584f1ca2c3f49b3/revdep/problems.md .

Learn more about duckplyr: https://duckplyr.tidyverse.org/ .

From the error message, I can't tell immediately what the cause of the failure is. I'd appreciate your help: can you please help digest a reproducible example that shows how duckplyr is behaving differently from dplyr in your use case?

The modified dplyr version can be installed with any of:

pak::pak("krlmlr/dplyr@f-revdep-duckplyr")
# remotes::install_github("krlmlr/dplyr@f-revdep-duckplyr")
# devtools::install_github("krlmlr/dplyr@f-revdep-duckplyr")

Thanks a lot for your help! Please let me know if you have any questions.

Tracker: https://github.com/tidyverse/duckplyr/issues/297.

malcolmbarrett commented 1 week ago

Before I look into it, is the idea here that we do library(duckplyr) and we get a free speedup via the dplyr parts of the codebase?

krlmlr commented 1 week ago

Precisely, that's the idea. Packages could use as_duckplyr_tibble() .

There is a translation layer, but it either translates perfectly-ish, or falls back to original dplyr -- in which case there's no speed-up, but still full compatibility.

malcolmbarrett commented 2 days ago

Thanks! I'll take a look this week. Already a big fan of duckplyr, so thanks for your work on that