Open thinkharderdev opened 1 year ago
@thinkharderdev this seems related: https://github.com/apache/arrow-datafusion/pull/3311
@thinkharderdev this seems related: apache/arrow-datafusion#3311
Thanks! I think that could potentially support this use case. Either TableProviderFactory
or TableProvider
would need to expose a way to get an AyncFileReader
.
Is your feature request related to a problem or challenge? Please describe what you are trying to do. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and why for this feature, in addition to the what)
DataFusion recently added a way to provide a user-defined
AsyncFileReader
toParquetExec
. Currently there is no way to leverage this in Ballista since the deserialization logic will construct aParquetExec
with the default implementation (which essentially just uses the registeredObjectStore
).Describe the solution you'd like A clear and concise description of what you want to happen.
We should be able to leverage this feature in Ballista without overriding the entire serialization logic for physical plans.
I see one of two approaches here:
ParquetFileReaderFactory
in theSessionContext
somewhere in which case it should be trivial to support in Ballista.For option 2, we might consider using the
PhysicalExtensionCodec
for this. We could add methods:Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.