apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.55k stars 197 forks source link

Ergonomic way to setup/configure `SessionContextExt` #1096

Closed milenkovicm closed 3 weeks ago

milenkovicm commented 1 month ago

Which issue does this PR close?

Closes #1092.

Rationale for this change

This change provides ergonomic way to configure SessionContextExt, similar to DataFusion context setup from using provided SessionStore.

What changes are included in this PR?

This change provides two new methods in SessionContextExt async fn standalone_with_state(state: SessionState) and async fn remote_with_state(url: &str,state: SessionState) Which accepts pre configured SessionState as parameter. SessionState should be configured in the same way like when SessionContext is configured in DataFusion.

let state = SessionStateBuilder::new().with_default_features().build();
let ctx: SessionContext = SessionContext::remote_with_state(&url, state).await?;

This change also exposes a BallistaSessionConfigExt which provides method to configure ballista specific settings like, BallistaConfiguration, codecs or even QueryPlanner.

use ballista_client::extension::BallistaSessionConfigExt;

let session_config = SessionConfig::new_with_ballista()
    .with_information_schema(true)
    .set_str(BALLISTA_JOB_NAME, "Super Cool Ballista App");

let state = SessionStateBuilder::new()
    .with_default_features()
    .with_config(session_config)
    .build();

let ctx: SessionContext = SessionContext::remote_with_state(&url, state).await?;

LogicalExtensionCodec and PhysicalExtensionCodec can be changed as well:

use ballista_client::extension::BallistaSessionConfigExt;
let logical_codec = Arc::new(BadLogicalCodec::default());
let physical_codec = Arc::new(MockPhysicalCodec::default());
let session_config = SessionConfig::new_with_ballista()
    .with_information_schema(true)
    .with_ballista_physical_extension_codec(physical_codec.clone())
    .with_ballista_logical_extension_codec(logical_codec.clone())
    ;
let state = SessionStateBuilder::new()
    .with_default_features()
    .with_config(session_config)
    .build();

let ctx: SessionContext = SessionContext::standalone_with_state(state).await?;

In this case logical and physical codec will be also be propagated to standalone.

Lastly, BallistaQueryPlanner can be replaced:

let session_config = SessionConfig::new_with_ballista()
    .with_information_schema(true)
    .set_str(BALLISTA_PLANNER_OVERRIDE, "false");

let state = SessionStateBuilder::new()
    .with_default_features()
    .with_config(session_config)
    .with_query_planner(Arc::new(BadPlanner::default()))
    .build();

let ctx: SessionContext = SessionContext::standalone_with_state(state).await?;

At the moment there is a hacky way telling ballista not to override provided planner with .set_str(BALLISTA_PLANNER_OVERRIDE, "false"); as it is not possible to detect if the planner is changed.

Are there any user-facing changes?

Introduction of two new methods and one extension to new functionality, no braking change

Notes:

milenkovicm commented 4 weeks ago

could this change get merged please @andygrove , i have follow up #1099 on top of this one

milenkovicm commented 3 weeks ago

Thanks @andygrove