anhnongdan / Spark1.6_Problems

All problems, errors when working with Spark 1.6
1 stars 0 forks source link

Coalesce problem when filter big parquet. #38

Open anhnongdan opened 6 years ago

anhnongdan commented 6 years ago

Update in 2019-01-03

Never use coalesce with complex calculation (join, aggregate), especially with small number of partitions (<30). => Small num_part limit the parallelization of the tasks and spill out the memory.

anhnongdan commented 5 years ago

thangnguyen_backtest_oneclick_20190629 14_48 - Details for Stage 8104 (Attempt 0).pdf

This is an example for bad coalescing, a small file is calculated from some giant files and then the cluster stuffed all task into one single node!