Closed jdries closed 1 month ago
It looks like .coalesce(1) always forces a single partition at the end, ignoring other things like tunables in spark conf. Hence, a potential solution is in allowing more output files, preferably in a better format.
it actually works better now, nodata filtering did it.
User is doing an aggregate_spatial over time, but with only one feature. At some point, the RDD size seems to be only 13MB, so spark decides that one partition should be sufficient. This does not appear to be the case, because even with 6GB executor memory, there's a lot of GC, and the task takes forever.