Open AydinChavez opened 1 year ago
Thanks for reporting this. VACUUM
was actually sucessful based on the following output
Deleted 0 files and directories in a total of 1 directories.
+--------------------+
| path|
+--------------------+
|file:/tmp/delta/t...|
+--------------------+
This looks like an issue when the program was stopping. It was trying to run Spark tasks when Spark was being stopped. Could you remove spark.sparkContext.setLogLevel("WARN")
so that we can get more information to see why there were still tasks being submitted after VACUUM?
You're welcome. Sure, I attached the log output and removed the mentioned logLevel-statement before.
Looks like it's a bug in Spark AQE. The issue is gone if turning off spark.sql.adaptive.enabled
. My hunch is AQE launches a Spark job asynchronously but doesn't cancel it properly when a query doesn't need all partitions. It would be great if you can create an issue for Spark community.
Bug
Describe the problem
So I was playing around a bit to introduce myself to Delta OSS and python. When running the VACUUM command, I get an exception related to threading.
Steps to reproduce
test.py
spark-submit --packages "io.delta:delta-core_2.12:2.2.0" test.py
Observed results
Expected results
Vacuum command should not give an exception.
Environment information
OS: MacOs Ventura
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?