Make `tidyselect` faster

Do not collect a 1-row slice but instead use the schema to create an empty DataFrame with the same columns and types, and use this in tidyselect. This shows performance improvements when we chain several functions, not just select().

Benchmark

large_iris <- data.table::rbindlist(rep(list(iris), 50000))
test <- as_polars(large_iris, lazy = TRUE)

bench::mark(
  starts_with = test |>
    select(starts_with(c("Sep", "Pet"))) |>
    mutate(
      petal_type = ifelse((Petal.Length / Petal.Width) > 3, "long", "large")
    ) |> 
    filter(between(Sepal.Length, 4.5, 5.5)) |> 
    collect(),
  iterations = 20,
  check = FALSE
) |> 
  dplyr::select(expression, 3:5)

Before:

# A tibble: 1 × 4
  expression    median `itr/sec` mem_alloc
  <bch:expr>  <bch:tm>     <dbl> <bch:byt>
1 starts_with    495ms      1.59    1.05MB

After:

# A tibble: 1 × 4
  expression    median `itr/sec` mem_alloc
  <bch:expr>  <bch:tm>     <dbl> <bch:byt>
1 starts_with    171ms      5.68     954KB

etiennebacher / tidypolars

Make `tidyselect` faster #61