delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.39k stars 1.66k forks source link

Spark job fails during CREATE file operation on Azure Data Lake Gen1 #631

Closed sudhikul closed 3 years ago

sudhikul commented 3 years ago

Hi,

I am facing an issue in executing the Spark job with the below details.

Spark Version: 3.0.0

Please help and let me know

  1. What is the root cause of the issue ?
  2. How to fix it ? Is this something to do with the size of ADL Gen1 file system.

Brief Overview Our Big Data Product runs in AKS Cluster deployed in Microsoft Azure.

All the jobs executed within the product are Apache Spark jobs. In addition to HDFS, even Azure Data Lake Gen1 is also one of the supported file systems.

Scenario Source generates events and publishes them into Azure Event Hubs. Spark Streaming job is waiting for events on a particular EH(Event Hub) and it will keep on writing the data into Azure Data Lake Gen1 file system.

_org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:355) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245) Caused by: com.microsoft.azure.datalake.store.ADLException: Error creating file /landing_home/_deltalog/.00000000000000001748.json.9d2edecf-973c-4d61-a178-4db46bd70f2c.tmp Operation CREATE failed with exception java.net.SocketTimeoutException : Read timed out Last encountered exception thrown after 1 tries. [java.net.SocketTimeoutException] [ServerRequestId:null] at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169) at com.microsoft.azure.datalake.store.ADLStoreClient.createFile(ADLStoreClient.java:281) at org.apache.hadoop.fs.adl.AdlFileSystem.create(AdlFileSystem.java:374) at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1228) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692)

sudhikul commented 3 years ago

Hi All,

This is just a gentle reminder!!!

Can anyone look into this issue and provide your inputs at the earliest?

Thanks and Regards Sudhindra

tdas commented 3 years ago

@sudhikul this seems to be an Azure system error -- the stack trace shows that the error is created by AdlFileSystem.java. This does not seem to be a Delta-specific issue. So please contact Azure support for such errors.