NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
824 stars 236 forks source link

[BUG] `fs.azure.account.keyInvalid` configuration issue while reading from Unity Catalog Tables on Azure DB #10318

Closed SurajAralihalli closed 7 months ago

SurajAralihalli commented 10 months ago

Describe the bug While using Azure Databricks and attempting to read a Managed Table from the Unity Catalog Metastore with the RAPIDS Accelerator, I encountered invalid credentials issue with the following message: Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key. However, this error doesn't occur when RAPIDS is disabled.

Notes Adding the credentials of the storage container to the Spark configuration properties can serve as an interim solution. However, this approach is not scalable when there are multiple storage containers.

Environment details Managed Tables on Azure Databricks with Unity Catalog and RAPIDS Accelerator

jlowe commented 10 months ago

What is the table format -- is it a Delta Lake table, a raw Parquet table, or something else? A stacktrace of the error would help.

Assuming this is with a table that's ultimately comprised of Parquet files, does this happen even with the config spark.rapids.sql.format.parquet.reader.type=PERFILE? If it works with the PERFILE reader, then that tells us the issue is with setting up the proper context for the multithreaded readers.

SurajAralihalli commented 10 months ago

Yes, its a Delta lake table. It didn't work with the spark.rapids.sql.format.parquet.reader.type=PERFILE. However works when I explicitly configure the fs.azure.account.key in spark properties.

Stack Trace:

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 28) (10.9.4.10 executor 0): Failure to initialize configuration for storage account databricksmetaeast.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:670)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2055)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:225)
    at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:64)
    at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)
    at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:61)
    at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:87)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
    at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:43)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:482)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$11(GpuParquetScan.scala:676)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$6(GpuParquetScan.scala:675)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$readAndSimpleFilterFooter$1(GpuParquetScan.scala:652)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.readAndSimpleFilterFooter(GpuParquetScan.scala:643)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$1(GpuParquetScan.scala:728)
    at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
    at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.filterBlocks(GpuParquetScan.scala:689)
    at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildBaseColumnarParquetReader(GpuParquetScan.scala:1338)
    at com.nvidia.spark.rapids.GpuParquetPartitionReaderFactory.buildColumnarReader(GpuParquetScan.scala:1328)
    at com.nvidia.spark.rapids.PartitionReaderIterator$.$anonfun$buildReader$1(PartitionReaderIterator.scala:66)
    at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.org$apache$spark$sql$rapids$shims$GpuFileScanRDD$$anon$$readCurrentFile(GpuFileScanRDD.scala:97)
    at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.nextIterator(GpuFileScanRDD.scala:151)
    at org.apache.spark.sql.rapids.shims.GpuFileScanRDD$$anon$1.hasNext(GpuFileScanRDD.scala:74)
    at org.apache.spark.sql.rapids.GpuFileSourceScanExec$$anon$1.hasNext(GpuFileSourceScanExec.scala:474)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$hasNext$4(GpuAggregateExec.scala:1930)
    at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
    at scala.Option.getOrElse(Option.scala:189)
    at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.hasNext(GpuAggregateExec.scala:1930)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:332)
    at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
    at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
    at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.doRunTask(Task.scala:179)
    at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:142)
    at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:41)
    at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:99)
    at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:104)
    at scala.util.Using$.resource(Using.scala:269)
    at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:103)
    at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:142)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.Task.run(Task.scala:97)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:904)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1740)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:907)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:761)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: Invalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.ConfigurationBasicValidator.validate(ConfigurationBasicValidator.java:49)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.Base64StringConfigurationBasicValidator.validate(Base64StringConfigurationBasicValidator.java:40)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.validateStorageAccountKey(SimpleKeyProvider.java:71)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:49)
    ... 63 more
razajafri commented 7 months ago

@mattahrens @sameerz Is this related to #8242?

sameerz commented 7 months ago

@mattahrens @sameerz Is this related to #8242?

Yes it is related. We do not need to test with Alluxio, but with filecache.

kacpermuchaaccenture commented 1 month ago

Did you resolve this issue?

tgravescs commented 1 month ago

yes this issue is fixed in release 24.06.02.

If you are seeing this issue with that version of the spark rapids plugin or newer please file an issue with the details and we can take a look.