duckdb / duckdb-r

The duckdb R package
https://r.duckdb.org/
Other
131 stars 23 forks source link

Duckdb equivalent to dplyr's separate() or separate_wider_delim()? #581

Open adamschwing opened 2 weeks ago

adamschwing commented 2 weeks ago

Hello!

I would like to take a comma separated string and put each element in its own row. This is easy to do in dplyr using the separate() or separate_wider_delim() functions. However, my dataset is very large because each string has thousands of elements and the dataset contains thousands of these strings across many columns and rows. So doing this separation is impractical using purely dplyr.

Is there an equivalent function in duckdb-r or duckplyr for this?

nbc commented 4 days ago

Something like that ?

library(duckdb)
#> Loading required package: DBI

con <- dbConnect(duckdb())

cat(readr::read_file('/tmp/split.csv'))
#> str1;str2
#> string;a1,a2,a3
#> string;a4,a5,a6

dbGetQuery(con, "SELECT str1, str_split(str2, ',').UNNEST() FROM read_csv('/tmp/split.csv', delim=';')")
#>     str1 unnest(str_split(str2, ','))
#> 1 string                           a1
#> 2 string                           a2
#> 3 string                           a3
#> 4 string                           a4
#> 5 string                           a5
#> 6 string                           a6

Created on 2024-11-21 with reprex v2.1.0