ADLSFileIO has an AzureProperties object. When ADLS_SHARED_KEY_ACCOUNT_NAME or ADLS_SHARED_KEY_ACCOUNT_KEY are set, AzureProperties creates a StorageSharedKeyCredential in its constructor. StorageSharedKeyCredential is not Serializable, so serialization fails during job startup.
If the storage account key is not supplied, DefaultAzureCredential will try to get credentials from the Azure CLI or another source like workload identity. That appears to work, but some environments may require shared key authentication.
The serialization error is below:
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could not serialize object for key serializedUDF.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:322)
... 13 more
Caused by: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could not serialize object for key serializedUDF.
at org.apache.flink.streaming.api.graph.StreamConfig.lambda$serializeAllConfigs$1(StreamConfig.java:203)
at java.base/java.util.HashMap.forEach(HashMap.java:1421)
at org.apache.flink.streaming.api.graph.StreamConfig.serializeAllConfigs(StreamConfig.java:197)
at org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:174)
at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718)
at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.NotSerializableException: com.azure.storage.common.StorageSharedKeyCredential
Apache Iceberg version
1.5.1 (latest release)
Query engine
Flink
Please describe the bug π
ADLSFileIO has an AzureProperties object. When ADLS_SHARED_KEY_ACCOUNT_NAME or ADLS_SHARED_KEY_ACCOUNT_KEY are set, AzureProperties creates a StorageSharedKeyCredential in its constructor. StorageSharedKeyCredential is not Serializable, so serialization fails during job startup.
If the storage account key is not supplied, DefaultAzureCredential will try to get credentials from the Azure CLI or another source like workload identity. That appears to work, but some environments may require shared key authentication.
The serialization error is below:
Sample app to trigger it below.
POM: