apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.39k stars 181 forks source link

Allow `ballista_scheduler` to be embedded in the process of other applications #568

Open r4ntix opened 1 year ago

r4ntix commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Recently, I am working with Ballista as a distributed query engine in our existing scheduling service systems.

I'm trying to embed ballista_scheduler into our already existing scheduling system due to the following cases:

I found that although ballista_scheduler has exposed the scheduler_server struct through the pub mod scheduler_server, but it still cannot be integrated and embedded.

Because the state and submit_job in the scheduler_server only are pub(crate): https://github.com/apache/arrow-ballista/blob/20891ae0a740c03b5a3a909ca033f45d59fcfc83/ballista/scheduler/src/scheduler_server/mod.rs#L59-L66 https://github.com/apache/arrow-ballista/blob/20891ae0a740c03b5a3a909ca033f45d59fcfc83/ballista/scheduler/src/scheduler_server/mod.rs#L150-L167

Describe the solution you'd like Expose state and submit_job in the scheduler_server to pub.

Describe alternatives you've considered

Additional context

r4ntix commented 1 year ago

@andygrove @yahoNanJing Could you please give some feedback and suggestions?

thinkharderdev commented 1 year ago

@r4ntix What would you need from state? I'm not so sure exposing that as a public interface is a great idea. For the most part the grpc interface has evolved to the point where it is primarily just pushing events into the scheduler's event loop so it wouldn't be major issue to expose those as public methods from SchedulerServer itself.