apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.21k stars 437 forks source link

[VL] Race when IO thread is spilling Velox task thread #7161

Closed zhztheplayer closed 2 weeks ago

zhztheplayer commented 2 months ago

Description

Background IO thread may first hold a lock then initialize spilling on Velox task thread. When this is happening, Velox task thread is commanded to pause. However before a successful pause, Velox task thread may still acquire memory and requests for the same lock that was already acquired by IO thread. Then a dead lock may be caused.

This issue causes recent timeout failures in recent GHA OOM test, Q97. E.g., https://github.com/apache/incubator-gluten/actions/runs/10736481853/job/29776061484.

FelixYBW commented 2 months ago

What's the fundamental solution? I'm thinking to allocate memory in task threads always. io thread can only allocate memory from global memory.

zhztheplayer commented 2 weeks ago

Closing by the fix of https://github.com/apache/incubator-gluten/issues/7243. Please help reopen if you found the issue still exists.