apache / arrow-nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
https://arrow.apache.org/nanoarrow
Apache License 2.0
169 stars 35 forks source link

feat(r): Implement string view support in R bindings #636

Closed paleolimbot closed 2 days ago

paleolimbot commented 6 days ago

This PR adds support for string view and binary view types to the R bindings. As a side effect of this, conversion of character vectors to Arrow types is now simpler (just goes through nanoarrow C's array builder) and supports more types (e.g., the arrow package is no longer required to create large_string, large_binary, or fixed_size_binary).

library(nanoarrow)

long_strings <- rep(strrep(letters, 100), 100)

(array <- as_nanoarrow_array(long_strings, schema = na_string_view()))
#> <nanoarrow_array string_view[2600]>
#>  $ length    : int 2600
#>  $ null_count: int 0
#>  $ offset    : int 0
#>  $ buffers   :List of 11
#>   ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
#>   ..$ :<nanoarrow_buffer unknown<string_view>[2600][41600 b]>`
#>   ..$ :<nanoarrow_buffer data<string>[32700 b]> `aaaaaaaaaaaaaaaaaaaaaaaaaaa...`
#>   ..$ :<nanoarrow_buffer data<string>[32700 b]> `ppppppppppppppppppppppppppp...`
#>   ..$ :<nanoarrow_buffer data<string>[32700 b]> `eeeeeeeeeeeeeeeeeeeeeeeeeee...`
#>   ..$ :<nanoarrow_buffer data<string>[32700 b]> `ttttttttttttttttttttttttttt...`
#>   ..$ :<nanoarrow_buffer data<string>[32700 b]> `iiiiiiiiiiiiiiiiiiiiiiiiiii...`
#>   ..$ :<nanoarrow_buffer data<string>[32700 b]> `xxxxxxxxxxxxxxxxxxxxxxxxxxx...`
#>   ..$ :<nanoarrow_buffer data<string>[32700 b]> `mmmmmmmmmmmmmmmmmmmmmmmmmmm...`
#>   ..$ :<nanoarrow_buffer data<string>[31100 b]> `bbbbbbbbbbbbbbbbbbbbbbbbbbb...`
#>   ..$ :<nanoarrow_buffer data<int64>[8][64 b]> `32700 32700 32700 32700 3270...`
#>  $ dictionary: NULL
#>  $ children  : list()

identical(convert_array(array), long_strings)
#> [1] TRUE

Created on 2024-09-27 with reprex v2.1.1