Open asfimport opened 2 years ago
Weston Pace / @westonpace: I'm able to reproduce this by compiling with ASAN, using stress to make the CPU busy, and running the following command:
while taskset -c 0,1 ./debug/arrow-dataset-scanner-test --gtest_filter=TestScannerThreading/TestScanner.FromReader/3Threaded2d16b1024r; do sleep 0.1; done
I'm primarily creating this so we can remember to make a test for this. This problem should be solved as part of ARROW-16072. When the scanner fails it simply discards references to the various scanner AsyncGenerators. However, some I/O tasks may still have references to these generators and so some parts of the scanner survive after the plan itself is marked complete. If there is an immediate shutdown then these parts will not be properly disposed of even though the plan is marked complete and it will show up as a memory leak.
Example:
https://pipelines.actions.githubusercontent.com/serviceHosts/8bb0d999-3387-4c48-9fa6-c66c718a46e2/_apis/pipelines/1/runs/359690/signedlogcontent/4?urlExpires=2022-07-25T14%3A43%3A01.2797488Z&urlSigningMethod=HMACV1&urlSignature=GS3lS09Q9sTRweN%2B8UEu2GwUGc%2FbO9eyH27FRKumbrg%3D
Reporter: Weston Pace / @westonpace
Note: This issue was originally created as ARROW-17198. Please see the migration documentation for further details.