Closed nathanielc closed 1 week ago
How about creating a FlightDataEncoder to encode an empty stream and then read the schema off the stream
let empty_stream = FlightDataEncoderBuilder::new()
.with_schema(pre_encoded_schema)
.build(streams::iter(vec![]));
let schema = empty_stream.schema();
If that works, perhaps we can add an example to the documentation
I would be hesitant to just make prepare_schema_for_flight
public as it seems somewhat brittle as the arguments need to remain in sync with however the FlightDataEncoder
is constructed, but it uses different types
I have seen but not followed closely work to create logical types separate from physical types. Possibly there is room for flight_info requests to report logical schemas and for the server to use any valid physical encoding of the data. This however requires much more coordination between clients and servers. Additionally its not clear that flight_info requests should actually deal in logical instead of physical schemas.
FWIW the logical type idea will likely remain in DataFusion as there is no concept of LogicalType in the Arrow type system (for better / worse)
@alamb Agreed, exposing the API is a fragile solution.
I like your proposed approach however the FlightDataEncoder type does not expose a method to access the schema. However that would be a small addition to its API. Should we add a the function
pub fn schema(&self) -> Option<SchemaRef> {
self.schema.clone()
}
In cases where the schema is known upfront it will have been hydrated and in cases where its not known upfront a None is returned. Thoughts? Maybe we call the function known_schema
to make it clear its only available when the schema is known upfront?
Makes sense to me!
label_issue.py
automatically added labels {'arrow'} from #6688
label_issue.py
automatically added labels {'arrow-flight'} from #6688
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am implementing a flight sql server using datafusion. See this logic that simply reports the flight_info schema as the result of the query schema.
The
FlightDataEncoder
has two modes for dictionary handling. In one mode it hydrates dictionaries thus changing the schema of the data during transport. The flight sql server needs to reflect the hydrated schema otherwise clients will be confused as the data received will not match the reported schema.Describe the solution you'd like
A simple solution would be to make this function public API so it can be reused. Describe alternatives you've considered
I have seen but not followed closely work to create logical types separate from physical types. Possibly there is room for flight_info requests to report logical schemas and for the server to use any valid physical encoding of the data. This however requires much more coordination between clients and servers. Additionally its not clear that flight_info requests should actually deal in logical instead of physical schemas.
Additional context
My solution for now is to copy to the logic into the server implementation. I'd be happy to submit a PR to make the function public if that is what we think is a good solution.