Closed etiennebacher closed 4 weeks ago
Should I also emphasize this in the docs of tidypolars or should this be done in upstream polars?
In any case I should mention something here because not everyone will look at polars
documentation
I suggest you look into dbplyr. dbplyr will generate a warning, I think.
Thanks for the info. Reprex for later:
suppressPackageStartupMessages({
library(dplyr)
library(dbplyr)
})
df <- tibble(x = c(1, 2, NA))
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
copy_to(con, df, "df", temporary = FALSE)
tbl(con, "df") |>
mutate(y = mean(x))
#> Warning: Missing values are always removed in SQL aggregation functions.
#> Use `na.rm = TRUE` to silence this warning
#> This warning is displayed once every 8 hours.
#> # Source: SQL [3 x 2]
#> # Database: sqlite 3.45.0 [:memory:]
#> x y
#> <dbl> <dbl>
#> 1 1 1.5
#> 2 2 1.5
#> 3 NA 1.5
Created on 2024-02-12 with reprex v2.1.0.9000
Maybe I could just use the has_nulls()
method in a pl$when()
, e.g
pl_mean <- function(x, na.rm = FALSE, ...) {
if (isTRUE(na.rm)) {
x$mean()
} else {
pl$when(x$has_nulls())$then(NA)$otherwise(x$mean())
}
}
Need to see how it would play with rowwise()
(and need to implement $has_nulls()
in polars
)
library(dplyr, warn.conflicts = FALSE)
library(tidypolars)
df <- tibble(x = c(1, 2, NA))
df |>
mutate(y = mean(x))
#> # A tibble: 3 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 NA
#> 2 2 NA
#> 3 NA NA
df |>
as_polars_df() |>
mutate(y = mean(x))
#> shape: (3, 2)
#> ┌──────┬──────┐
#> │ x ┆ y │
#> │ --- ┆ --- │
#> │ f64 ┆ f64 │
#> ╞══════╪══════╡
#> │ 1.0 ┆ null │
#> │ 2.0 ┆ null │
#> │ null ┆ null │
#> └──────┴──────┘
polars
doesn't have an equivalent tona.rm
that is common in R'ssum()
,mean()
, etc. This can lead to different results:polars
doesn't have this kind of arguments on purpose (we should know the data and usedescribe()
to see thenull_count
). Still, this can surprise R users when switching topolars
. Should I raise a message every time?na.rm = FALSE
by default so I need to see whether I can capture it in...
when I translate the expression topolars
.Should I also emphasize this in the docs of
tidypolars
or should this be done in upstreampolars
?