brad-richardson / osm-pbf-parquet

MIT License
2 stars 0 forks source link

Allow direct streaming to/from S3 #2

Open brad-richardson opened 1 month ago

brad-richardson commented 1 month ago

Read and write directly to S3 to avoid need for large local disks

osmpbf supports arbitrary inputs that support Read + Send https://github.com/b-r-u/osmpbf/blob/main/src/reader.rs#L17

The parquet rust library has an object reader - https://arrow.apache.org/rust/parquet/arrow/async_reader/struct.ParquetObjectReader.html but doesn't appear to have an object writer. Can probably implement an S3AsyncWriter conforming to https://arrow.apache.org/rust/parquet/arrow/async_writer/trait.AsyncFileWriter.html

https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3.html

brad-richardson commented 1 month ago

This has been implemented for writing out to S3