apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.33k stars 2.42k forks source link

[SUPPORT] Error while setting OCC in spark structured streaming #8213

Closed haripriyarhp closed 1 year ago

haripriyarhp commented 1 year ago

Tips before filing an issue

Describe the problem you faced

I have a spark structured streaming job reading from kafka and writing to MoR in S3. I am using hudi-spark3.1-bundle_2.12-0.13.0.jar along with aws-java-sdk-bundle_1.11.271.jar & hadoop-aws_3.1.2.jar. Based on the article here https://hudi.apache.org/docs/next/concurrency_control , I added the below configs to my job "hoodie.write.concurrency.mode" -> "optimistic_concurrency_control", "hoodie.cleaner.policy.failed.writes" -> "LAZY", "hoodie.write.lock.provider" -> "org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider", "hoodie.write.lock.dynamodb.table" -> "hudi_partitions" but then it gave the error ERROR SensorStreaming$: No enum constant org.apache.hudi.common.model.WriteConcurrencyMode.optimistic_concurrency_control

full configs are "hoodie.table.name" -> tableName, "path" -> "s3a://path/Hudi/".concat(tableName), "hoodie.datasource.write.table.name" -> tableName, "hoodie.datasource.write.table.type" -> MERGE_ON_READ, "hoodie.datasource.write.operation" -> "upsert", "hoodie.datasource.write.recordkey.field" -> "col5,col6,col7", "hoodie.datasource.write.partitionpath.field" -> "col1,col2,col3,col4", "hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.write.hive_style_partitioning" -> "true", //Cleaning options "hoodie.clean.automatic" -> "true", "hoodie.clean.max.commits" -> "3", //"hoodie.clean.async" -> "true", //hive_sync_options "hoodie.datasource.hive_sync.partition_fields" -> "col1,col2,col3,col4", "hoodie.datasource.hive_sync.database" -> dbName, "hoodie.datasource.hive_sync.table" -> tableName, "hoodie.datasource.hive_sync.enable" -> "true", "hoodie.datasource.hive_sync.mode" -> "hms", "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor", "hoodie.upsert.shuffle.parallelism" -> "200", "hoodie.insert.shuffle.parallelism" -> "200", "hoodie.datasource.compaction.async.enable" -> true, "hoodie.compact.inline.max.delta.commits" -> "10", "hoodie.index.type" -> "BLOOM" //"hoodie.metadata.index.bloom.filter.enable" -> "true", //"hoodie.metadata.index.column.stats.enable" -> "true", //"hoodie.enable.data.skipping" -> "true" "hoodie.write.concurrency.mode" -> "optimistic_concurrency_control", "hoodie.cleaner.policy.failed.writes" -> "LAZY", "hoodie.write.lock.provider" -> "org.apache.hudi.client.transaction.lock.DynamoDBBasedLockProvider", "hoodie.write.lock.dynamodb.table" -> "hudi_partitions" To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

ERROR SensorStreaming$: No enum constant org.apache.hudi.common.model.WriteConcurrencyMode.optimistic_concurrency_control

xicm commented 1 year ago

You can change optimistic_concurrency_control to the upper case, we can improve this problem.

codope commented 1 year ago

@haripriyarhp the suggestion above should work for you. Please reopen in case you're still facing some issue.