Open asfimport opened 2 years ago
Dragoș Moldovan-Grünfeld / @dragosmg: Hi [~hicks.daniel.j@gmail.com],
Thanks for submitting this issue. You are correct, list columns of varying lengths are not yet supported in arrow
. For the time being there are a couple of possible workarounds.
library(tibble)
library(arrow, warn.conflicts = FALSE)
df1 = data.frame(x = c(1, 2, 3),
y = c('a', 'b', 'c'))
df2 = data.frame(x = c(4),
y = c('d'),
z = c('foo'))
comb_df = tibble(id = c(1, 2),
df = c(list(df1), list(df2)))
# make them all have the same column names
all_ptypes <- lapply(comb_df$df, vctrs::vec_ptype)
common_ptype <- vctrs::vec_ptype_common(!!! all_ptypes)
comb_df$df <- lapply(comb_df$df, vctrs::vec_cast, common_ptype)
Table$create(comb_df)
#> Table
#> 2 rows x 2 columns
#> $id <double>
#> $df: list<item: struct<x: double, y: string, z <string>>>
# serialize them to JSON
comb_df = tibble(id = c(1, 2),
df = c(list(df1), list(df2)))
comb_df$df <- vapply(comb_df$df, jsonlite::toJSON, character(1))
Table$create(comb_df)
#> Table
#> 2 rows x 2 columns
#> $id <double>
#> $df <string>
I'm brand new to arrow, but didn't seem to find anything like this issue in this bug tracker; apologies if this is a known issue.
Arrow is giving me an error when I try to write Parquet or Feather files for a dataframe that contains a list column (
{}df{
} in the MWE) that contains dataframes that have varying numbers of columns:This gives me
Session info:
Environment: R 4.1.0, arrow 6.0.1, macOS Big Sur 11.6 Reporter: Dan Hicks
Note: This issue was originally created as ARROW-14909. Please see the migration documentation for further details.