amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
428 stars 107 forks source link

Apache Arrow support #564

Closed vkhodygo closed 1 year ago

vkhodygo commented 1 year ago

Is your feature request related to a problem? Please describe. Certain datasets with a large number of indicator columns would surely benefit from 1bit boolean values.

Describe the solution you'd like Apache Arrow provides integration with R as well as 1bit bools.

Describe alternatives you've considered Use the data as it is, it worked for years.

stefvanbuuren commented 1 year ago

Not quite sure what we would gain from 1bit boolean, and whether that would work across platforms.

vkhodygo commented 1 year ago

@stefvanbuuren

1bit boolean

That doesn't have to be booleans only, any short integer should do. Any code that's using AVX2, AVX512 etc is supposed to work faster since we don't work with 64bit numbers.

whether that would work across platforms

arrow is supposed to be an agnostic format

Still, if you think that's too much, we can leave it as it is.