apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.36k stars 2.42k forks source link

[SUPPORT] What Class Name to use for hoodie.errortable.write.class #11252

Open soumilshah1995 opened 5 months ago

soumilshah1995 commented 5 months ago

I'm trying out Hudi error tables, but I'm having trouble finding the documentation for the hoodie.errortable.write.class value. Could you please assist me?

sample config

hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.datasource.write.recordkey.field=invoiceid
hoodie.datasource.write.partitionpath.field=destinationstate
hoodie.streamer.source.dfs.root=file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/sampledata/
hoodie.datasource.write.precombine.field=replicadmstimestamp
hoodie.streamer.transformer.sql=SELECT * FROM <SRC> a where sas
hoodie.errortable.base.path=file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/error/
hoodie.errortable.target.table.name=error_invoice
hoodie.errortable.enable=true
hoodie.errortable.write.class=

Job


spark-submit \
  --class org.apache.hudi.utilities.streamer.HoodieStreamer \
  --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0 \
  --properties-file spark-config.properties \
  --master 'local[*]' \
  --executor-memory 1g \
   /Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
  --table-type COPY_ON_WRITE \
  --op UPSERT \
  --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
  --source-ordering-field replicadmstimestamp \
  --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
  --target-base-path file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/ \
  --target-table invoice \
  --props hudi_tbl.props

I want to purposely fail the job and I want to see error tables being created

singhaniaayush commented 2 months ago

Hi @soumilshah1995, this might help: hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/BaseErrorTableWriter.java.

As per my understanding we have to extend this class and provide suitable implementation.

soumilshah1995 commented 2 months ago

I assumed there was a default class or implementation provided, but it seems there isn't. Is there a default implementation available for this? If not, could we kindly request one be added?

ad1happy2go commented 2 months ago

@soumilshah1995 There is no implemented class out of the box. You need to implement, can refer this test class - https://github.com/apache/hudi/blob/master/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJsonKafkaSource.java

soumilshah1995 commented 2 months ago

got it got it makes sense

soumilshah1995 commented 2 months ago

Do we plan to establish a generic default implementation for clients? I'm wondering if it would be beneficial to provide a standardized default option for clients this is just an inquiry curious to know we if plan to release default version in future ?

singhaniaayush commented 2 months ago

@ad1happy2go i'm trying to have this implemented. However, not able to test it. I've created record level errors but the values are not going in error_table. Checked in the HTTP callback as well error counts are not coming in. Also, i did checked the test kafka implementation but still not able to reproduce it. If there are any misconfigurations on the error_table configs I do get the error from hudi. Any help is highly appreciated.

soumilshah1995 commented 2 months ago

maybe Aditya or someone can help us to get simple POC working for this that way lot of people and engineer will benefit from this @singhaniaayush

ad1happy2go commented 2 months ago

Thanks @singhaniaayush Lets have working session to understand this issue.

ad1happy2go commented 2 months ago

I was able to connect with @singhaniaayush and provided the feedback. Let us know if you are able to implement it,. Thanks.