Closed soumilshah1995 closed 1 year ago
closing issue as the issue was with lake formation
Hi @soumilshah1995 just here to ask what the issue was. I am having a similar issue with lake formation that I can't get to figure out when trying to read a Hudi table from Data catalog. Can this be related? If not, do you have any suggestions?
Glue config: Glue 4.0 Job Parameters: --datalake-formats hudi --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false
Code:
spark = SparkSession.builder.config('spark.serializer','org.apache.spark.serializer.KryoSerializer')\
.config('spark.sql.hive.convertMetastoreParquet', 'false')\
.config("spark.sql.parquet.datetimeRebaseModeInRead", "CORRECTED")\
.config("spark.sql.avro.datetimeRebaseModeInWrite", "CORRECTED")\
.getOrCreate()
glueContext = GlueContext(spark.sparkContext)
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
logger = glueContext.get_logger()
dataFrame = glueContext.create_dynamic_frame_from_catalog(
database = "my_db",
table_name = "my_table"
)
Error:
2023-05-03 21:18:58,045 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(77)): Error from Python:Traceback (most recent call last):
File "/tmp/read hudi without connector.py", line 37, in
Hey Buddy @juanAmayaRamirez
just use glue 4.0 and pass these param it will be fixed
"""
--additional-python-modules | faker==11.3.0
--conf | spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog --conf spark.sql.legacy.pathOptionBehavior.enabled=true --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
--datalake-formats | hudi
"""
Thanks for the quick response! (love your videos BTW) but sorry to tell that I am getting the same error.
An error occurred while calling o110.getDynamicFrame. Reads and writes using Lake Formation permissions are not supported for hudi tables.
I was able to read the table using spark directly like:
dataFrame = spark.read.format("hudi").load("s3://bucket/path/to/my_table/")
BUT NOT with glueContext using a table already in the glue datacatalog.
dataFrame = glueContext.create_dynamic_frame_from_catalog(
database = "my_db",
table_name = "my_table"
)
According to AWS docs: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html both should work fine.
Hello We were using AWS Market place connector and this morning i was preparing some hudi labs thats when this error started to show up
Code
Error Message
Connector Version
Note : i have tried this labs before and it was all fine until this morning when it started to throw hive sync error