Closed masthanmca closed 7 months ago
@masthanmca Is the the first time you are facing this issue or after upgrade you started facing this one.
Your configurations also looks wrong? From where you got these or which doc you referred? can you refer - https://hudi.apache.org/docs/gcp_bigquery/
Facing the same issue , does not work with org.apache.hudi:hudi-spark3.3-bundle_2.12:0.14.1 .
Hudi Write to path works , Hive Sync works but BQ sync does not work.
For now have taken this route based on a flag to manually perform the BQSync with BQSyncTool post the dataframe.write
https://github.com/apache/hudi/issues/9355#issuecomment-1696764242
@abhishekshenoy @masthanmca That (https://github.com/apache/hudi/issues/9355#issuecomment-1696764242) i.e. BigQuerySyncTool is the correct way of doing BQ sync with batch jobs.
The another way is doing this with HudiStreamer.
@ad1happy2go @the-other-tim-brown
But should nt that be internally called when we are providing the Hudi Bq
configs and enabling META_SYNC_ENABLED.
In my case we use df.write.options(hudiAndHiveAndBQConfigs).save() and
the hudiAndHiveAndBQConfigs has both hive and bq related configs .
*But still only hive sync happens implicitly*.
Is it by design that as part of our write function we need to perform both
df.write.options(hudiAndHiveAndBQConfigs).save()
new BigQuerySyncTool(getBigQueryProps).syncHoodieTable()
@masthanmca @abhishekshenoy I went through the code and identified that we need to set both the class names to do both metastync together. The default value for below prop is just hive sync. I tried with 0.14.1 hudi version and after write and hive sync completed, it tried to do Big query sync also.
"hoodie.meta.sync.client.tool.class" : "org.apache.hudi.hive.HiveSyncTool,org.apache.hudi.gcp.bigquery.BigQuerySyncTool"
@masthanmca Closing out this issue as I confirmed it works. Please reopen in case you still see this issue.
Tips before filing an issue
Have you gone through our FAQs? yes
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced BQ sync is not working with hudi bundle jar A clear and concise description of the problem. I wanted to enable BQ sync while writing ingest the data into HUDI table using manifest file. To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version : 0.14.0
Spark version : 3.3.2
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : GCS
Running on Docker? (yes/no) :no
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
No error , but external table not created in Big Query