apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.34k stars 2.42k forks source link

[SUPPORT]HoodieException: Commit 20220509105215 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:492) #5540

Open nleena123 opened 2 years ago

nleena123 commented 2 years ago

Hi Team, our hudi jobs are failing with below exception. Can you please help me on this.

HoodieException: Commit 20220509105215 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:492)

Environment Description

we are running the apache hudi spark in azure databricks . using below dependancies .

org.apache.spark:spark-avro_2.12:3.1.1 org.apache.hudi:hudi-utilities-bundle_2.12:0.8.0 org.apache.hudi:hudi-spark3-bundle_2.12:0.8.0

pratyakshsharma commented 2 years ago

Can you share the entire stacktrace?

nleena123 commented 2 years ago

@pratyakshsharma HI what is stacktrace ?, i am not aware of it ,can you please help with more details.

pratyakshsharma commented 2 years ago

@nleena123 I meant to say please share the detailed error log and not just the exception message. :)

nleena123 commented 2 years ago

Hi @pratyakshsharma

Please find the attached complete error log details.

metrics_job_logs.txt

nleena123 commented 2 years ago

@nsivabalan Can you help us , this is happing in our production server.

nsivabalan commented 2 years ago

can you furnish the write configs you use to write to hudi.

nleena123 commented 2 years ago

hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator

hoodie.datasource.write.hive_styling_partioning=true hoodie.datasource.write.partitionpath.urlencode=false

hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator

hoodie.datasource.write.recordkey.field=interaction_id hoodie.datasource.write.partitionpath.field=create_datetime:TIMESTAMP hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyyMMdd HH:mm hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM hoodie.deltastreamer.keygen.timebased.timezone=UTC-6:00 hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator

20211119 13:39

hoodie.index.type=GLOBAL_BLOOM

hoodie.bloom.index.update.partition.path=true

schema provider configs

hoodie.deltastreamer.schemaprovider.registry.url=https://azure-prod-schema-registry.extscloud.com/subjects/async.messaging.interaction.metrics-value/versions/3

hoodie.deltastreamer.schemaprovider.registry.url=https://azure-prod-schema-registry.extscloud.com/subjects/async.messaging.interaction.metrics-value/versions/latest

Kafka Source

hoodie.deltastreamer.source.kafka.topic=async.messaging.interaction.metrics

Kafka props

hoodie.auto.commit=true

enable.auto.commit=true

hoodie.deltastreamer.kafka.source.maxEvents=1000000

group.id=async.messaging.metrics bootstrap.servers=prod-hcc-kaas-central-13.centralus.cloudapp.azure.com:443

metadata.broker.list=kaas-prod-elr-a.optum.com:443

auto.offset.reset=latest auto.reset.offsets=latest

schema.registry.url=https://azure-test-schema-registry.extscloud.com

schema.registry.url=https://azure-prod-schema-registry.extscloud.com:443 schema.registry.ssl.keystore.location=/dbfs/mnt/ulp/artifacts/topics/certs/keystore.jks schema.registry.ssl.keystore.password=prmcert schema.registry.ssl.truststore.location=/dbfs/mnt/ulp/artifacts/topics/certs/truststore.jks schema.registry.ssl.truststore.password=prmcert schema.registry.ssl.key.password=prmcert security.protocol=SSL sasl.mechanism=PLAIN ssl.keystore.location=/dbfs/mnt/ulp/artifacts/topics/certs/keystore.jks ssl.keystore.password=*** ssl.truststore.location=/dbfs/mnt/ulp/artifacts/topics/certs/truststore.jks ssl.truststore.password=** ssl.key.password=prmcert

pratyakshsharma commented 2 years ago

Sorry for being late here. I see the class DFAvroKafkaSource in logs. Are you having your own custom class for reading the data?

nleena123 commented 2 years ago

yes , below DFAvroKafkaSource.java file we are using to read the data which extends AvroSource. DF3HoodieDeltaStreamer.java which extends HoodieDeltaStreamer , and attached pom.xml file code.zip in the folder please have a look on it and let me know ,if any changes required in my code .

nsivabalan commented 2 years ago

and can you share your deltastreamer configs please?

nsivabalan commented 2 years ago

are you trying out for the first time and running into issues. or was this running ok for sometime and all of a sudden you hit this when you upgraded or changed any configs?

nleena123 commented 2 years ago

Hi @nsivabalan Below attached property file contain all configs that we used to This job. And used passed below arguments to databrick job (we are running hudi job through Azure databricks)

metrics.properties.txt

["--table-type","COPY_ON_WRITE","--source-ordering-field","CDC_TS","--source-class","com.optum.df.hudi.sources.DFAvroKafkaSource","--target-base-path","/mnt/ulp/dataassets-lake/metrics/","--target-table","metrics","--schemaprovider-class","org.apache.hudi.utilities.schema.SchemaRegistryProvider","--props","/mnt/ulp/artifacts/properties/metrics.properties"]

Note:- No we didn't change any configs .

pratyakshsharma commented 2 years ago

I see hoodie.datasource.write.hive_styling_partioning should be changed to hoodie.datasource.write.hive_style_partitioning. Looking at your timeline, 20211211183554__commit__COMPLETED is the last time the job ran successfully, after that you have been running into issues. So we want to understand if you changed anything since this last successful commit?

nleena123 commented 2 years ago

@pratyakshsharma

i have changed to hoodie.datasource.write.hive_style_partitioning=true and i ran the job , still i could see the same issue. i have not made any changes after 20211211183554commitCOMPLETED

pratyakshsharma commented 2 years ago

Yes, changing to hoodie.datasource.write.hive_style_partitioning will not fix your issue. That was just a typo in your configs (unrelated to this error), so I just highlighted this.

pratyakshsharma commented 2 years ago

Also I guess there is some confusion with respect to the code that you shared and the logs that you shared. In the code, I see multiple logs getting printed which are not there in the logs file you shared. Also the schema getting registered in the logs is not having CDC_TS field. Can you share latest logs? That might help in debugging the issue better.

nleena123 commented 2 years ago

We are adding new field CDC_TS to Kafka data through code which present in DF3HoodieDeltaStreamer.java program. Code :- static GenericRecord getDataWithCDC_TS(GenericRecord record, String ts) {

    Schema base = record.getSchema();
    List<Schema.Field> baseFields = base.getFields().stream()
            .map(field -> {
                Schema.Field f = new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultVal());
                field.aliases().forEach(f::addAlias);
                return f;
            })
            .collect(Collectors.toList());
    List<Schema.Field> newFields = new ArrayList<>(baseFields);
    ArrayList<Schema> optionalString = new ArrayList<>();
    optionalString.add(Schema.create(Schema.Type.NULL));
    optionalString.add(Schema.create(Schema.Type.STRING));
    //JsonNode nullDefault = JsonNodeFactory.instance.nullNode();

    //Schema.Field f = new Schema.Field(CDC_TS, Schema.createUnion(optionalString), CDC_TS, nullDefault);
    Schema.Field f = new Schema.Field(CDC_TS, Schema.createUnion(optionalString), CDC_TS, null);
    if (!baseFields.contains(f))
        newFields.add(f);

    Schema newSchema = Schema.createRecord(
            base.getName(),
            "New schema by adding a new field,CDC_TS",
            base.getNamespace(), base.isError(),
            newFields);

    LOG.info("newSchema is:" + newSchema);

    GenericRecord newRecord = new GenericData.Record(newSchema);

    base.getFields().stream().forEach(field -> newRecord.put(field.name(), record.get(field.name())));
    newRecord.put(CDC_TS, ts);

    return newRecord;

}

i have attached Latest log file , please have look on it. log4j-active_latest.txt

qianchutao commented 2 years ago

I have the same problem, and I haven't solved it yet

nleena123 commented 2 years ago

Hi @xushiyan , @pratyakshsharma From very long back , i am seeing this issue , Can you please update on this , or please suggest me the resolution steps ?

nleena123 commented 2 years ago

Hi Team, Can you please update me on the above issue ??

nleena123 commented 2 years ago

HI @qianchutao , Are you able to fix the issue ? and can you please let me know which version of hudi are you using ?

ROOBALJINDAL commented 2 years ago

I am also facing the similar exception, using EMR on AWS with kafka. Even exception is not self explanatory if any config is missing or what is exactly wrong

Exception:

22/08/09 07:30:39 INFO TaskSetManager: Finished task 0.0 in stage 66.0 (TID 2033) in 113 ms on ip-10-151-46-136.us-west-2.compute.internal (executor 1) (1/2) 22/08/09 07:30:39 INFO TaskSetManager: Finished task 1.0 in stage 66.0 (TID 2034) in 140 ms on ip-10-151-46-136.us-west-2.compute.internal (executor 1) (2/2) 22/08/09 07:30:39 INFO YarnScheduler: Removed TaskSet 66.0, whose tasks have all completed, from pool 22/08/09 07:30:39 INFO DAGScheduler: ResultStage 66 (collectAsMap at HoodieSparkEngineContext.java:151) finished in 0.151 s 22/08/09 07:30:39 INFO DAGScheduler: Job 32 is finished. Cancelling potential speculative or zombie tasks for this job 22/08/09 07:30:39 INFO YarnScheduler: Killing all running tasks in stage 66: Stage finished 22/08/09 07:30:39 INFO DAGScheduler: Job 32 finished: collectAsMap at HoodieSparkEngineContext.java:151, took 0.153794 s 22/08/09 07:30:39 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:649) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) 22/08/09 07:30:39 ERROR HoodieAsyncService: Service shutdown with error java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:189) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:186) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.sync(HoodieMultiTableDeltaStreamer.java:416) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:709) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:649) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675)

ROOBALJINDAL commented 2 years ago

HI @qianchutao , Are you able to fix the issue ? and can you please let me know which version of hudi are you using ?

Were you able to fix this issue?

nsivabalan commented 2 years ago

@nleena123 @ROOBALJINDAL : looks like the stacktrace does not have sufficient info to triage. So, can you folks enable debug logs and share what you see. We should probably get some insights into whats happening. Also, if we can somehow get a reproducible code, would really help us triage it faster. Since timeline events are very tough to grok manually looking at the logs.

ROOBALJINDAL commented 2 years ago

@nsivabalan @pratyakshsharma I tried enabling debug logs by changing root log4j.rootLogger=INFO to log4j.rootLogger=DEBUG in Hudi jar for following files

log4j.properties log4j-surefire.properties log4j-surefire-quiet.properties

Also updated some other properties in these files:

log4j.rootLogger=DEBUG, CONSOLE
log4j.logger.org.apache=DEBUG
log4j.logger.org.apache.hudi=DEBUG

I am using this custom hudi jar by following spark command:

spark-submit  \
  --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer s3://slava-redshift-test/hudi/roobal-hudi-Debug-logger/hudi-utilities-bundle_2.12-0.11.0_edfx_debug.jar \

I have also tried this:

spark-submit  \
  --conf spark.executor.extraJavaOptions=-Dlog4j.debug=true \

Can you help what I am missing?

Note: I am using EMR cluster on AWS.

nsivabalan commented 2 years ago
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
​
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
​
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
​
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
​
# Example
log4j.logger.org.apache.hudi=DEBUG

sample log4j file.

--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties" --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties"
nsivabalan commented 2 years ago

@ROOBALJINDAL : can we have some updates.

ROOBALJINDAL commented 2 years ago

@nsivabalan yeah, issue worked for me. I logged separate issue and I was able to get help and resolve. Thank you team 👍 and fyi.. this original issue is logged by someone else so I cant close this issue.

nsivabalan commented 2 years ago

oh ok ok, got it. thanks. may I know which issue you are referring to.

nsivabalan commented 2 years ago

@nleena123 : are you still facing issues or are we good to close it out ?

ROOBALJINDAL commented 2 years ago

@nsivabalan this was my issue which I have already closed.

https://github.com/apache/hudi/issues/6348

nleena123 commented 2 years ago

still i could see the same issue do i need to follow the below step to fix the issue ?

--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties" --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties"

but i am getting this issue while running data bricks job ?

nsivabalan commented 2 years ago

above command is just to enable debug logs. that's not the fix as such. we might need proper driver logs, and contents of ".hoodie" to investigate the issue.

nsivabalan commented 1 year ago

@nleena123 : are you still looking for assistance.

nsivabalan commented 1 year ago

@nleena123 : gentle ping.

pratyakshsharma commented 1 year ago

@nleena123 can we close this issue if you are fine?