etiennebacher / tidypolars

Get the power of polars with the syntax of the tidyverse
https://tidypolars.etiennebacher.com
Other
141 stars 3 forks source link

Can `bind_rows_polars()` takes a named list like `bind_rows()` does for the `.id` argument? #116

Closed ginolhac closed 1 month ago

ginolhac commented 1 month ago

What functionality are you missing?

I often use bind_rows() with a named list to have meaningful names instead of ids as integers

Example:

t1 <- tibble(
  x = c("a", "b"),
  y = 1:2
)
t2 <- tibble(
  x = c("c", "d"),
  y = 3:4
)
bind_rows(t1 = t1, tib2 = t2, .id = "id")

gives the chosen names for ids:

# A tibble: 4 × 3
  id    x         y
  <chr> <chr> <int>
1 t1    a         1
2 t1    b         2
3 tib2  c         3
4 tib2  d         4

Is this functionality present in the tidyverse or in polars (or both)? Present in the tidyverse

With {tidypolars}:

p1 <- pl$DataFrame(
  x = c("a", "b"),
  y = 1:2
)
p2 <- pl$DataFrame(
  x = c("c", "d"),
  y = 3:4
)

# create an id colum
bind_rows_polars(p1, p2, .id = "id")

gives integers as ids:

shape: (4, 3)
┌─────┬─────┬─────┐
│ id  ┆ x   ┆ y   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ 1   │
│ 1   ┆ b   ┆ 2   │
│ 2   ┆ c   ┆ 3   │
│ 2   ┆ d   ┆ 4   │
└─────┴─────┴─────┘
ginolhac commented 1 month ago

It is really a feature request, as I am currently making this workaround to have those meaningful names:

bind_rows_polars(select(human, type) |> mutate(animal = "human"),
                 select(salmon, type) |> mutate(animal = "salmon")) |> 
  count(type, animal) |> 
  collect()
etiennebacher commented 1 month ago

Thanks, I didn't know this case. It is now possible with the devel version:

library(dplyr, warn.conflicts = FALSE)
library(tidypolars)

t1 <- tibble(
  x = c("a", "b"),
  y = 1:2
)
t2 <- tibble(
  x = c("c", "d"),
  y = 3:4
)
bind_rows(t1 = t1, tib2 = t2, .id = "id")
#> # A tibble: 4 × 3
#>   id    x         y
#>   <chr> <chr> <int>
#> 1 t1    a         1
#> 2 t1    b         2
#> 3 tib2  c         3
#> 4 tib2  d         4

t1 <- as_polars_df(t1)
t2 <- as_polars_df(t2)
bind_rows_polars(t1 = t1, tib2 = t2, .id = "id")
#> shape: (4, 3)
#> ┌──────┬─────┬─────┐
#> │ id   ┆ x   ┆ y   │
#> │ ---  ┆ --- ┆ --- │
#> │ str  ┆ str ┆ i32 │
#> ╞══════╪═════╪═════╡
#> │ t1   ┆ a   ┆ 1   │
#> │ t1   ┆ b   ┆ 2   │
#> │ tib2 ┆ c   ┆ 3   │
#> │ tib2 ┆ d   ┆ 4   │
#> └──────┴─────┴─────┘
ginolhac commented 1 month ago

fast and sharp! Wonderful ! I catched another little trick you don't support, let me know if it is asking too much or if I should open a new issue. In dplyr::select() one can select AND rename at the same time, such as:

select(t1, x, why = y)
# A tibble: 2 × 2
  x       why
  <chr> <int>
1 a         1
2 b         2

with tidypolars, we need to select(t1, x, y) |> rename(why = y)