amitbans commented 1 year ago

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

We are upgrading Hudi from 0.7 to 0.10.1 (part of EMR 5.33.1 to EMR 5.36.0) and facing stage failures at stage "Doing partition and writing data isEmpty at HoodieSparkSqlWriter.scala:627". We have tried increasing executor memory from 30g to 50g but error persists. We have set spark parallelism, shuffle partitions and hoodie.upsert.shuffle.parallelism to 200 but this particular stage seems to be calculating less tasks leading to OOM.

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.10.1
Spark version : Spark 2.4.8
Hive version : Hive 2.3.9
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

amitbans commented 1 year ago

nsivabalan commented 1 year ago

@jonvex : do you think you can follow up on this.

nsivabalan commented 1 year ago

@amitbans : can you paste the write configs you are using. also, screen short of jobs and stages page from sparkUI as well. For the particular job and stage thats failing, if you can click on "+details" and show us the stacktrace, that would be nice as well (bcoz, sometimes, the job/stage description might not match the exact code)

amitbans commented 1 year ago

@nsivabalan We are using following config

table type : COW HOODIE_UPSERT_SHUFFLE_PARALLELISM=200 OPERATION_OPT_KEY=UPSERT

Spark SQL config

SET spark.sql.shuffle.partitions = 200; set spark.default.parallelism = 200;

I don't have the stack trace now as EMR cluster is terminated. To workaround this issue, we changed the hudi library on EMR 5.36 to use 0.7 and jobs are working fine. Therefore we are going ahead with using hudi 0.7 with EMR 5.36

apache / hudi

[SUPPORT] Upsert job failing while upgrading from 0.7 to 0.10.1 #7574

Spark SQL config