eto-ai / rikai

Parquet-based ML data format optimized for working with unstructured data
https://rikai.readthedocs.io/en/latest/
Apache License 2.0
138 stars 19 forks source link

Allow set parquet block size via option #596

Closed eddyxu closed 2 years ago

eddyxu commented 2 years ago

Use

df.write.format("rikai").option("rikai.block.size", 12345).save("dest")

to customize block size in the resulted parquet files.

eddyxu commented 2 years ago

Closes #589