Closed mccheah closed 5 years ago
So this would be similar to the behavior of spark.hadoop
properties?
I think that would make sense. We want to configure writers primarily through Iceberg table configuration, and it would be good to have a way to do that other than just a small list of known and supported properties.
Originally I was thinking we could put it in sparkSession.write.option(iceberg.spark.hadoop.<opt>)
but now that you mention it this could make more sense to be in the Table's inherent properties. I can put up a patch for this.
We want configuration to come from multiple places. A write may want to use a different format, so that would be passed like your example. But most things we want to apply at a table level and not at a job level. Parquet tuning parameters are a good example of that -- we want to set them once and know that all writers know what to do.
A PR would be great. Thanks!
The Iceberg data source just uses the Spark Session's global Hadoop configuration when constructing File System objects in
HadoopTableOperations
,Reader
, andWriter
. We propose support for specifying additional reader and writer-specific options to the Hadoop configuration. The data source can parse out options with the prefixiceberg.spark.hadoop.*
and apply those to the Hadoop configuration that is sent to all uses of the Hadoop FileSystem API throughout the Spark DataSource.