Closed RameshkumarChikoti123 closed 2 weeks ago
just configure another lock provider class instead of the file system based one.
@RameshkumarChikoti123 clustering needs some lock provider to be configured. If you are sure there is no concurrent ingestion job running, one was is to configure to this property s3a hoodie.fs.atomic_creation.support S3 is currently not supported with lock provider as it doesn't allow atomic creation of objects. But it can be workaround for you in this case.
Although i do believe if you are setting hoodie.write.concurrency.mode to SINGLE_WRITER explicitly it should work as it shouldn't need any lock provider
@ad1happy2go I am not running any jobs and have configured SINGLE_WRITER, but I am still seeing the same issue. Below is my SQL query. please let me know if any additional configuration is needed.
spark.sql(f""" CALL run_clustering(path => '{hudi_table_path}', options => ' hoodie.write.concurrency.mode=SINGLE_WRITER) """).show()
@RameshkumarChikoti123 Can you try this -
spark.sql(f""" CALL run_clustering(path => '{hudi_table_path}', options => ' hoodie.fs.atomic_creation.support=s3a') """).show()
@ad1happy2go The configuration you provided is working. Thank you!"
I am using the path parameter with run_clustering, but I'm encountering an error.
Expected behaviour Clustering should execute successfully.
Environment Description
Hudi version : 0.15.0 Spark version : 3.3.0 Storage : S3 Hive version : NA Running on Docker : Yes Hadoop version : 3.3.4
Steps to reproduce the behaviour:
Stacktrace: