Closed devinrsmith closed 6 days ago
Changeset is fine, modulo incredibly minor quibbles. I guess the real question is, what do we give up? It seems to me like having access to the full URI is potentially more powerful, and that we're preventing "hybrid" Iceberg catalogs with more than one URI scheme in use (which is (1) something I kind of want to build, and (2) not something that would totally surprise me in the wild).
Good catch regarding URIs with different schemes, we can either keep a set of channel providers in the Base Layout class or, for now, add an assertion that all the location keys have same URI scheme as the table location.
I've added an explicit URI check; it's a good point. That said, I'm almost certain that we don't support multiple URI schemes today given the pain points of configuration (not necessarily speaking of DH).
This greatly improves the efficiency of Iceberg reading. Previously, it was creating a
SeekableChannelsProvider
per URI, and now only one is created once per layout (/ Table).To aid in this update, objects that were previously created on-demand in
IcebergBaseLayout
are now created once upon construction. To enable this, it was noted that only the URI scheme is relevant for discrimination, and not actually the full URI to the data files. Thus, we can use the URI scheme as provided viaorg.apache.iceberg.Table#location
to do any up-front loading.The various interfaces that take a URI have been update to take a URI scheme instead. While this change could technically have been made in a non-breaking fashion by delegating existing URI methods to URI scheme methods, the existence of the URI methods encourages the wrong mental model and is easy to misuse, so they have been removed.
One of the
ParquetTableLocationKey
constructors has been deprecated, marked for removal. A more appropriate constructor has been added.BREAKING CHANGE:
SeekableChannelsProviderLoader.fromServiceLoader
has been removed, replaced withSeekableChannelsProviderLoader.load
.DataInstructionsProviderLoader.fromServiceLoader
has been removed, replaced withDataInstructionsProviderLoader.load
.SeekableChannelsProviderPlugin
methods have been changed, now use aString
for the URI scheme instead of aURI
.DataInstructionsProviderPlugin.createInstructions
method has been changed, now uses aString
for the URI scheme instead of aURI
.IcebergTableParquetLocationKey
has added a newSeekableChannelsProvider
parameter.