Closed MCaviezel closed 1 year ago
Hi Marco,
Sorry for the delayed reply. An Airflow job runs that dataproc process which requires JAR files to be passed in:
The file is downloaded during the deployment: https://github.com/GoogleCloudPlatform/data-analytics-golden-demo/blob/5c10931f58516827cfe2ee84e0cc550965e8d2a5/terraform-modules/deploy-files/tf-deploy-files.tf#L608
Let me know if that helps.
Dear Adam,
Thanks a lot for sharing all your knowledge. Our team is in the process of using Iceberg with Dataproc and then connect the Iceberg table to BigLake.
For us it seems like the "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" are not getting loaded properly. We also don't find a jar for this. Can you tell us how you manage to make them run?
https://github.com/GoogleCloudPlatform/data-analytics-golden-demo/blob/5c10931f58516827cfe2ee84e0cc550965e8d2a5/dataproc/convert_taxi_to_iceberg_create_tables.py#L39C9-L39C111
Thanks and best regards, Marco