Open Nishanbuyo opened 2 months ago
Hi @Nishanbuyo ,
This is an issue connecting Databricks to Azure blob storage. If you can access abfss in a notebook, that same spark.conf should work for this jar (you can do this work directly in a notebook, no need to update the jar).
The workaround I've seen has been to mount the directory and read from the mount point.
I'll leave this issue open, but it's broadly related to connecting Azure Gen2 storage to Databricks Spark.
Hi @zavoraad , I can access abfss in notebook i.e file can be read directly but when same file path is passed from the library, it gives authentication error
Mounting the directory fixes the issue but thats not an option for our project
Hi @Nishanbuyo,
The error is specifically happening when the executor tries to open the file to read
The filesystem gets instantiated differently in the unreleased jar, from the file Path instead of from HadoopConf.
Can you see if the error still persists using the newer jar? Note it's running Spark 3.4.1.
Good evening. I'm not sure where to put this, but I was able to update your base code to Scala 1.12.18 and Spark 3.5.0. This is a genius package and I used it to get familiar with Scala/Spark programming. I wanted to share but I am in no way a scala developer, and, therefore, was not like to fork or update this repo directly. The main update was with the RowEncoder needed to be converted to ExpressionEncoder[Row]. That, and the notebook code for the streamWriter: change table('...') to .toTable('...').
Changes in JsonMRFSource.scala
Added this in JsonMRFSource class to access azure blob storage using service principle but getting the error as Invalid configuration value detected for fs.azure.account.key
Pyspark code in databricks
Error: