apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.93k stars 1.12k forks source link

Support Register object stores via SessionStateBuilder #12553

Open alamb opened 3 hours ago

alamb commented 3 hours ago

Is your feature request related to a problem or challenge?

Part of https://github.com/apache/datafusion/issues/12550

While working on https://github.com/datafusion-contrib/datafusion-dft I want to register various types of extensions while configuring the SessionContext, ideally adding each extension to the SessionStateBuilder each time.

However, I found that there were a few APIs missing on SessionStateBuilder so I had to implement my own workaround builder here: https://github.com/datafusion-contrib/datafusion-dft/blob/8247555f9464058c1ac3370196739ac2b19343ee/src/extensions/builder.rs#L102-L1078

And then call it https://github.com/datafusion-contrib/datafusion-dft/blob/8247555f9464058c1ac3370196739ac2b19343ee/src/extensions/s3.rs#L59-L62

SessionStateBuilder has no way to register an object store.

Describe the solution you'd like

I would like a way to register object stores

Also it should have

  1. Documentation
  2. Tests (ideally a doc test with an example of how to use to use it)

Describe alternatives you've considered

I recommend adding two new functions

  1. SessionStateBuilder::with_object_store that calls through to RuntimeEnv::register_object_store
  2. SessionStateBuilder::runtime_env() that returns the current RuntimeEnv (follow model here) which would permit access to the underlying RuntimeEnv for access to other more advanced features

So this would be used like

let state = SessionStateBuilder::new() 
  .with_object_store(url, object_store)
  .build()

Or

let mut builder = SessionStateBuilder::new();
bulder.runtime_env().register_object_store(url, object_store)
let state = builder.build()

Additional context

No response

alamb commented 3 hours ago

I think this is a good first issue as it is clearly described and straightforward to implement. It would be a good introduction to DataFusion I think