Closed shubhamn21 closed 5 months ago
@shubhamn21 Please Provide Writer configurations.
Here it is:
options = {
"hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.datasource.write.operation": "insert",
"hoodie.datasource.write.table.type": "MERGE_ON_READ",
"hoodie.update.partial.fields": "true",
"hoodie.upsert.shuffle.parallelism": "2",
"hoodie.insert.shuffle.parallelism": "2",
"hoodie.index.bloom.num_entries": "60000",
"hoodie.index.bloom.fpp": "0.000000001",
"hoodie.compaction.lazy.block.read": "false",
"hoodie.enable.data.skipping": "true",
"hoodie.logfile.max.size": "1073741824",
"hoodie.parquet.small.file.limit": "104857600",
"hoodie.parquet.max.file.size": "125829120",
"hoodie.parquet.block.size": "125829120",
"hoodie.clean.automatic": "false",
"hoodie.clean.async": "true",
"hoodie.datasource.write.precombine.field":"kafka_offset",
"hoodie.datasource.write.recordkey.field":"id,cid",
"hoodie.datasource.write.partitionpath.field":"kafka_topic,event_dt",
"hoodie.datasource.hive_sync.enable": "true",
'hoodie.datasource.hive_sync.use_jdbc': 'false',
'hoodie.datasource.hive_sync.mode': 'hms',
'hoodie.datasource.hive_sync.partition_extractor_class': "org.apache.hudi.hive.MultiPartKeysValueExtractor",
"hoodie.datasource.write.hive_style_partitioning": "true",
"hoodie.table.name": "snimbalkar_test_table_ro"
}
I think it may have to do something with AWSGlue compatibility. The documentation said that it is only support upto hudi 0.12.1.
As a workaround - I am using .save
instead of .saveAsTable
. I am not able to sync with glue/hive but able to I am able to ingest data and query with spark-sql.
Closing this as it is no longer an issue.
Describe the problem you faced
Unable to write a hudi table to aws hadoop emr setup. From the error it seems that it is failing while creating a metadata table (with suffix
_ro
) with hive/glue. Am I missing a setting with hive to allow it create Null type tables? Are there alternative solutions?To Reproduce
Steps to reproduce the behavior:
df.write.format("hudi")\ .mode('append') \ .options(**options)\ .partitionBy("kafka_topic", "event_dt") \ .saveAsTable('db_name.snimbalkar_test_table')
Expected behavior
Creates and stores table.
Environment Description
Hudi version : 0.13.1
Spark version : 3.30
Hive version :
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : EMRFS
Running on Docker? (yes/no) : no
Additional context
Add any other context about the problem here.
Stacktrace