apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.39k stars 181 forks source link

Scheduler infinite loop after failed/canceled job #341

Open andygrove opened 1 year ago

andygrove commented 1 year ago

Describe the bug See PR description at https://github.com/apache/arrow-ballista/pull/340

To Reproduce See PR description at https://github.com/apache/arrow-ballista/pull/340

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

mingmwang commented 1 year ago

Ah, this is a known issue, actually I added the failed job check in the pop_next_task() loop. Without such check, the scheduler loop will try to schedule the pending tasks from failed job which would be worse !!

@yahoNanJing Please take a look and have a fix.

yahoNanJing commented 1 year ago

Thanks @andygrove and @tfeda for reporting this issue. I'll try to fix it.

smallzhongfeng commented 10 months ago

Any update? This will directly cause the ui to be unavailable.