Describe the solution you'd like
Move the self.task_manager.submit_job to JobSubmitted stage, keep it in single thread , and use Arc to leave the plan on heap avoid deep clone.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem or challenge? Please describe what you are trying to do. After print long running events https://github.com/apache/arrow-ballista/issues/749 In load test
Cluster
: with 60 executor (each 200 slots) and 1 scheduler cluster.WorkLoad
: 200 clients submit 0.5s cost query in sequence.found :
events cost 159 ms!
events cost 279 ms!
events cost 289 ms!
Modify memory status cost hunder us, seems have lock confict issue.
After read the code found in
QueryStageSchedulerEvent::JobQueued
usetokio::spawn
to update memory status in parallel https://github.com/apache/arrow-ballista/blob/a9ecd3a065077bec6c5d271e890d091c594746fa/ballista/scheduler/src/state/mod.rs#L377-L379Describe the solution you'd like Move the
self.task_manager.submit_job
toJobSubmitted
stage, keep it in single thread , and useArc
to leave the plan on heap avoid deep clone.Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.