Closed arunb2w closed 1 year ago
@arunb2w can u please attach the spark plans (logical & physical) for the same.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
Apache Iceberg version
0.14.0
Query engine
EMR
Please describe the bug š
spark version : 3.2.1 iceberg version: 0.14.0 Running the below code in EMR with Glue catalog Target: I have an iceberg table(EPAYMENT) which contains 457M records in it and it is about 89.4G in size. It is a partitioned table based on the column _CONTEXTID and it has 18K partitions. This table resides in S3 and has catalog as Glue catalog. Input: My incoming batch to update to this iceberg table contains 335K records which needs to access 5K partitions to update these 335K records. The incoming batch is in the form of spark dataframe so am creating a view out of it and using it in the merge to upsert the records. Merge query am using to upsert the incoming batch.
Spark command used to run:
spark-submit --deploy-mode cluster--packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0,software.amazon.awssdk:bundle:2.17.257,software.amazon.awssdk:url-connection-client:2.17.257 --conf spark.yarn.submit.waitAppCompletion=true --conf "spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=\"/opt/spark\"" --conf spark.dynamicAllocation.enabled=true --conf spark.executor.maxMemory=32g --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.shuffle.service.enabled=true --driver-memory 8g --num-executors 1 --executor-memory 8g --executor-cores 5 iceberg_main.py
The problem here is, when i view the job in spark UI, i could see that shuffle write size and the number of records to upsert is very high compared to the actual number which should be 335K records based on the incoming batch. So, it looks like the partition pruning is not happening as expected. Because of this huge shuffle write my EMR cluster is running out of memory and could not able to complete the job. Please see the attached image
Please provide some insights on what went wrong here