Open Madan16 opened 1 year ago
@Madan16 Looks like the avro library mismatch issue, as you are saying this is running fine for 2 months do you know if any aws lib or any other updated recently.
@Madan16 Looks like the avro library mismatch issue, as you are saying this is running fine for 2 months do you know if any aws lib or any other updated recently.
@ad1happy2go : Sorry but I could not understand your question. Can you please be more specific so that I can provide more details. Note***: I am using this (pyspark) code in AWS glue
@Madan16 I wanted to ask were you using AWS Glue version : Glue 3.0 only from start. (When the job is successful)
My guess is somehow the version mismatch might be happening which is resulting in ClassNot Found for SchemaConverters which is not present in older avro versions.
@Madan16 I wanted to ask were you using AWS Glue version : Glue 3.0 only from start. (When the job is successful)
My guess is somehow the version mismatch might be happening which is resulting in ClassNot Found for SchemaConverters which is not present in older avro versions.
@ad1happy2go : yeah Glue 3.0 version since beginning
@Madan16 Were you able to resolve it. Can you try to use by our bundle jar once instead on using --datalake-format.
Hello all and @ad1happy2go : Strange thing is happening...
This is really strange. why this code is not running fine with Glue 3.0 as it was running fine till issue came.
@Madan16 This is strange as the code just stopped working with Glue 3.0. Should be something related to AWS specifically. I am not able to reproduce this issue either. Can we use Glue 4.0 for your use case for now. If you still need to use Glue 3.0 I suggest you to use the compiled version of hudi jar instead of relying on AWS provided data lake-formats.
@ad1happy2go I used glue 4.0 and it ran fine, as i said in my previous comments.
Tips before filing an issue
Have you gone through our FAQs? - Yes
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced: We are trying to upsert in non-partioned table. so far code was working fine(ran for almost 2 months once in a day) for every upsert but all of a sudden it started failing with below reasons: 1) An error occurred while calling o167.save. org/apache/spark/sql/avro/SchemaConverters$. 2) An error occurred while calling o168.save. Failed to upsert for commit time 20230410133751.
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
Using below code to perform the upsert: print('Writing to unpartitioned Hudi table.') combinedConf = {commonConfig, unpartitionDataConfig, incrementalConfig} outputDf.write.format('org.apache.hudi').options(combinedConf).mode('Append').save(targetPath)
configuration details: commonConfig = {'className' : 'org.apache.hudi', 'hoodie.datasource.hive_sync.use_jdbc':'false', 'hoodie.datasource.write.precombine.field': 'ingest_dt', 'hoodie.datasource.write.recordkey.field': primaryKey, 'hoodie.table.name': tableName, 'hoodie.consistency.check.enabled': 'true', 'hoodie.datasource.hive_sync.database': dbName, 'hoodie.datasource.hive_sync.table': tableName, 'hoodie.datasource.hive_sync.enable': 'true'}
targetPath: s3 bucket
Expected behavior
upsert should have happened as it was running fine untill above error started showing up.
Environment Description
Hudi version : Apache Hudi Connector version 3.0_hudi_0.9.0_glue_3.0 AWS Glue version : Glue 3.0
Spark version : 3.1
Python version : 3
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : S3 (both source and target).
File format: parquet using snappy compression.
Running on Docker? (yes/no) : No
Additional context
Add any other context about the problem here. column and data type of source:
|-- pk_ABC_ky: string |-- A: int |-- B: long |-- C: date |-- D: date |-- E: string |-- op: string |-- source_name: string |-- source_schema: string |-- source_table: string |-- ingest_dt: string
column and data type of target: pk_ABC_ky:string A:int B:bigint C:date D:date E:string op:varchar(1) source_name: varchar(24) source_schema:varchar(24) source_table:varchar(13) ingest_dt:string
Stacktrace