Open metesynnada opened 6 months ago
This one is actually quite interesting. I hadnt gotten around to it yet but I was going to propose having both sync and async apis for SchemaProvider
(I think theres valid use cases for both). I just recently ran into an issue where I wanted the ctx.table_provider
and therefore table
method on SchemaProvider
to be sync (as part of logical plan serialization which is sync). How would you feel about having both sync and async APIs?
+1 for this proposal. We ran into similar problems when implementing datafusion/iceberg integrations: https://github.com/apache/iceberg-rust/pull/324
It's more nature to allow catalog/schema operations to be async.
Yes, I agree allowing catalog APIs to be async makes sense to me
One rationale against async, as I recall from @tustvold , was to encourage people to implement efficient catalog implementations (e.g. for many cases calling a network call for table_exists
might be slow)
However, my personal opinion is that such encouragement can be done via documentation and if people want to implement RPC network calls during planning then the APIs shouldn't stop them
I think the biggest challenge is, as @metesynnada hints at above, the viral nature of async
-- if we make such APIs async then everywhere they are called must also be be async
-- I haven't looked at how far down the stack that is but it could be substantail.
An alternate approach might be to implement, via some hackery and tokio channels, an struct that implements the SchemaProvider
without changes (sync) but can call async methods (though that would block the runtime thread 🤔 )
I seem to remember omitting the methods other than table due to the sheer size of the resulting change, I have no objection to making them async
Is your feature request related to a problem or challenge?
Currently, only
table
API is async, butregister_table
,deregister_table
,table_names
, andtable_exist
can be also async. Also, most of the usage is under async methods, thus there shouldn't be a big issue implementing it.However, this can be a huge change for the downstream usage, this is why I need some insides before me or someone else implements it.
cc @Dandandan @alamb
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response