b-r-u / osmpbf

A Rust library for reading the OpenStreetMap PBF file format (*.osm.pbf).
Apache License 2.0
122 stars 19 forks source link

Streamed PBF reads #47

Open brad-richardson opened 3 months ago

brad-richardson commented 3 months ago

I'm looking at adding support for streamed PBF reads from network sources (in my case, from S3). My current plan is to wrap a bytestream produced by object_store into something like an AsyncBlobReader. Are streamed reads something you'd be interested in for this library? If so, do you have any suggestions for implementation?

I did consider using something like mountpoint-s3 instead, but unfortunately I'm working in a managed container environment that doesn't support FUSE mountpoints so I'll need to manage the reads myself.

P.S. Thanks for the library, been using it with good results in a little PBF transcoder.

b-r-u commented 3 months ago

A streaming reader sounds great. Do you think it would be possible to implement the needed functionality in this crate without adding a big dependency? Maybe it can also be done as an optional dependency or there are just some missing low-level pieces that can be added to the public interface, so that the AsyncBlobReader can be implemented in another crate.

I am not very familiar with the async ecosystem but it looks like you would need to collect the bytes of a stream until you get a full Blob (at most 32 MB in size). Here is the relevant function for that: https://github.com/b-r-u/osmpbf/blob/fd55e640c274f3fdec81e4ff94fca92578ee3922/src/blob.rs#L265 It reads a u32 header size and then reads the BlobHeader which includes the size of the following Blob. I hope this helps!