apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.63k stars 803 forks source link

[IPC] Expose `schema` on `StreamDecoder` #6420

Open wjones127 opened 2 months ago

wjones127 commented 2 months ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

There is a nice example in the StreamDecoder::decode docs from reading from a stream:

fn print_stream<I>(src: impl Iterator<Item = Buffer>) -> Result<(), ArrowError> {
   let mut decoder = StreamDecoder::new();
   for mut x in src {
       while !x.is_empty() {
           if let Some(x) = decoder.decode(&mut x)? {
               println!("{x:?}");
           }
       }
   }
   decoder.finish().unwrap();
   Ok(())
}

https://docs.rs/arrow-ipc/latest/arrow_ipc/reader/struct.StreamDecoder.html#method.decode

However, it doesn't show how to get the schema. This would be useful if someone wanted to construct a RecordBatchReader or SendableRecordBatchStream from a stream of Bytes / Buffer. And it would be particularly helpful in cases where there were zero batches but we still wanted to get the schema.

Describe the solution you'd like

Minimally, it would be nice to do this:

impl StreamDecoder {
    /// Return the schema, if decoded yet. Returns `None` if the schema message
    /// has yet to be decoded.
    fn schema(&self) -> Option<SchemaRef> { ... }
}

But it might also be nice to parallel the decode method:

impl StreamDecoder {
    /// Return the schema, if decoded yet. Returns `None` if the schema message
    /// has yet to be decoded.
    fn decode_schema(&mut self, buffer: &mut Buffer) -> Option<SchemaRef> { ... }
}

Describe alternatives you've considered

I asked in Slack if there were other straight forward ways, but didn't see anything easy from the suggestions. https://the-asf.slack.com/archives/C01QUFS30TD/p1726636373388569

Additional context

fsdvh commented 2 weeks ago

Also interested in this one