apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.43k stars 3.51k forks source link

[R] update metadata when casting a record batch column #29727

Open asfimport opened 3 years ago

asfimport commented 3 years ago

library(arrow, warn.conflicts = FALSE)

> See arrow_info() for available features

raws <- structure(list( as.raw(c(0x70, 0x65, 0x72, 0x73, 0x6f, 0x6e)) ), class = c("arrow_binary", "vctrs_vctr", "list")) batch <- record_batch(b = raws) batch$metadata$r

> 'arrow_r_metadata' chr "A\n3\n262147\n197888\n5\nUTF-8\n531\n1\n531\n1\n531\n2\n531\n1\n16\n3\n262153\n12\narrow_binary\n262153\n10\nvc"| truncated

> List of 1

> $ columns:List of 1

> ..$ b:List of 2

> .. ..$ attributes:List of 1

> .. .. ..$ class: chr [1:3] "arrow_binary" "vctrs_vctr" "list"

> .. ..$ columns : NULL

  1. when casting b to a string column, the metadata is kept batch$b <- batch$b$cast(utf8()) batch$metadata$r

    > 'arrow_r_metadata' chr "A\n3\n262147\n197888\n5\nUTF-8\n531\n1\n531\n1\n531\n2\n531\n1\n16\n3\n262153\n12\narrow_binary\n262153\n10\nvc"| truncated

    > List of 1

    > $ columns:List of 1

    > ..$ b:List of 2

    > .. ..$ attributes:List of 1

    > .. .. ..$ class: chr [1:3] "arrow_binary" "vctrs_vctr" "list"

    > .. ..$ columns : NULL

  2. but it should not have batch2 <- record_batch(b = "string") batch2$metadata$r

    > NULL

Reporter: Romain Francois / @romainfrancois

Note: This issue was originally created as ARROW-14138. Please see the migration documentation for further details.

asfimport commented 3 years ago

Romain Francois / @romainfrancois: This probably should be dealt with in the R Schema class. For RecordBatch/Table this needs to touch RemoveColumn, AddColumn and SetColumn because all 3 might alter the R metadata.

asfimport commented 2 years ago

Dewey Dunnington / @paleolimbot: I'm not seeing any metadata here anymore...should there be!?


library(arrow, warn.conflicts = FALSE)

raws <- as.vector(
  Array$create(
    list(as.raw(c(0x70, 0x65, 0x72, 0x73, 0x6f, 0x6e))),
    type = binary()
  )
)

batch <- record_batch(b = raws)
batch$metadata$r
#> NULL

batch$b <- batch$b$cast(utf8())
batch$metadata$r
#> NULL

batch2 <- record_batch(b = "string")
batch2$metadata$r
#> NULL
asfimport commented 2 years ago

Todd Farmer / @toddfarmer: This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.