NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
762 stars 225 forks source link

[BUG] `fs.azure.account.keyInvalid` configuration issue while reading from Unity Catalog Tables on Azure DB #10318

Closed SurajAralihalli closed 2 months ago

SurajAralihalli commented 5 months ago

Describe the bug While using Azure Databricks and attempting to read a Managed Table from the Unity Catalog Metastore with the RAPIDS Accelerator, I encountered invalid credentials issue with the following message: Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key. However, this error doesn't occur when RAPIDS is disabled.

Notes Adding the credentials of the storage container to the Spark configuration properties can serve as an interim solution. However, this approach is not scalable when there are multiple storage containers.

Environment details Managed Tables on Azure Databricks with Unity Catalog and RAPIDS Accelerator

jlowe commented 5 months ago

What is the table format -- is it a Delta Lake table, a raw Parquet table, or something else? A stacktrace of the error would help.

Assuming this is with a table that's ultimately comprised of Parquet files, does this happen even with the config spark.rapids.sql.format.parquet.reader.type=PERFILE? If it works with the PERFILE reader, then that tells us the issue is with setting up the proper context for the multithreaded readers.

SurajAralihalli commented 5 months ago

Yes, its a Delta lake table. It didn't work with the spark.rapids.sql.format.parquet.reader.type=PERFILE. However works when I explicitly configure the fs.azure.account.key in spark properties.

Stack Trace:

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 28) (10.9.4.10 executor 0): Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:670)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2055)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:225)
    at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:64)
    at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)
    at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:61)
    at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:87)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
    at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:43)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:482)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$11(GpuParquetScan.scala:676)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$6(GpuParquetScan.scala:675)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$1(GpuParquetScan.scala:652)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.readAndSimpleFilterFooter(GpuParquetScan.scala:643)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:728)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.filterBlocks(GpuParquetScan.scala:689)
    at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildBaseColumnarParquetReader(GpuParquetScan.scala:1338)
    at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildColumnarReader(GpuParquetScan.scala:1328)
    at com.nvidia.spark.rapids.PartitionReaderIterator$.$anonfun$buildReader$1(PartitionReaderIterator.scala:66)
    at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.org$apache$spark$sql$rapids$shims$GpuFileScanRDD$$anon$$readCurrentFile(GpuFileScanRDD.scala:97)
    at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.nextIterator(GpuFileScanRDD.scala:151)
    at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.hasNext(GpuFileScanRDD.scala:74)
    at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(GpuAggregateExec.scala:1930)
    at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(GpuAggregateExec.scala:1930)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
    at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
    at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
    at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
    at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
    at scala.util.Using$.resource(Using.scala:269)
    at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
    at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.Task.run(Task.scala:97)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: Invalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.ConfigurationBasicValidator.validate(ConfigurationBasicValidator.java:49)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.Base64StringConfigurationBasicValidator.validate(Base64StringConfigurationBasicValidator.java:40)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.validateStorageAccountKey(SimpleKeyProvider.java:71)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:49)
    ... 63 more
razajafri commented 2 months ago

@mattahrens @sameerz Is this related to #8242?

sameerz commented 2 months ago

@mattahrens @sameerz Is this related to #8242?

Yes it is related. We do not need to test with Alluxio, but with filecache.