Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As we tend to reduce code footprint I would like to propose to replace BallistaContext with SessionContext.
It would definitely improve usability as we would get most of the methods available in SessionContext also, some DataFusion applications would be deployable to Ballista with single line change.
use ballista::{extension::SessionContextExt, prelude::*};
use datafusion::prelude::SessionContext;
let ctx : SessionContext = SessionContext::ballista_standalone().await?;
With write sinks now in place, we will get write support as well, feature Ballista did not have before.
IMHO it would make a lot of sense to have a single api across DataFusion and Ballista.
If replacement is successful it would enable us to re-use Datafusion Python crate, eliminating need for maintenance
of Ballista Python, We would need to provide SessionContext::ballista_standalone and equivalent methods.
import datafusion
import ballista.standalone
from datafusion import col
# create a context (datafusion context with ballista standalone enabled)
ctx = ballista.standalone.SessionContext()
There are clear benefits of deprecation of BallistaContext, however decision may be problematic as we could not hide SessionContext
methods which do not work with ballista. SessionContext may bring usability issues with UDF support, configuration and basically all functionalities which need to be propagated across the cluster to work, and which may not be trivial to address. We may try to be address the by "turning off" those methods in ballista or just by documenting it, still some effort is needed. Or maybe its not issue at all?
Describe the solution you'd like
Rough action plan:
Create SessionContextExt which would expose methods for creating standalone nad remote context, re-using BallistaQueryPlanner.
Verify basic SQL and DataFrame support.
Verify/fix write support (plans with write Sink are generated but write operation does not create valid files).
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As we tend to reduce code footprint I would like to propose to replace
BallistaContext
withSessionContext
.It would definitely improve usability as we would get most of the methods available in SessionContext also, some DataFusion applications would be deployable to Ballista with single line change.
With write sinks now in place, we will get write support as well, feature Ballista did not have before.
IMHO it would make a lot of sense to have a single api across DataFusion and Ballista.
If replacement is successful it would enable us to re-use Datafusion Python crate, eliminating need for maintenance of Ballista Python, We would need to provide
SessionContext::ballista_standalone
and equivalent methods.There are clear benefits of deprecation of
BallistaContext
, however decision may be problematic as we could not hideSessionContext
methods which do not work with ballista.SessionContext
may bring usability issues with UDF support, configuration and basically all functionalities which need to be propagated across the cluster to work, and which may not be trivial to address. We may try to be address the by "turning off" those methods in ballista or just by documenting it, still some effort is needed. Or maybe its not issue at all?Describe the solution you'd like
Rough action plan:
SessionContextExt
which would expose methods for creatingstandalone
nadremote
context, re-usingBallistaQueryPlanner
.SQL
andDataFrame
support.BallistaContext
.Describe alternatives you've considered
Additional context
relates to #1068