Azure / azure-data-lake-store-java

Microsoft Azure Data Lake Store Filesystem Library for Java
Other
20 stars 34 forks source link

Spark job fails during CREATE file operation on Azure Data Lake Gen1 #48

Open sudhikul opened 3 years ago

sudhikul commented 3 years ago

Hi,

I am facing an issue with Spark job, which is reading streaming data from Azure Event Hub and storing the data in ADL(Azure Data Lake) Gen1 file system.

Spark Version: 3.0.0

Please help and let me know

  1. What is the root cause of the issue ?
  2. How to fix it ? Is this something to do with the size of ADL Gen1 file system.
  3. Also, one more observation is that - this is happening usually when the size of the input transactions is more (1 million). But, this issue is usually not seen when the size is less than 1M. Is this just a co-incidence ? Or is it something to do with the size of input load also ?

Brief Overview Our Big Data Product runs in AKS Cluster deployed in Microsoft Azure.

All the jobs executed within the product are Apache Spark jobs. In addition to HDFS, even Azure Data Lake Gen1 is also one of the supported file systems.

Scenario Source generates events and publishes them into Azure Event Hubs. Spark Streaming job is waiting for events on a particular EH(Event Hub) and it will keep on writing the data into Azure Data Lake Gen1 file system.

All of a sudden, it fails in the middle with the below error: _org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:355) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245) Caused by: com.microsoft.azure.datalake.store.ADLException: Error creating file /landing_home/deltalog/.00000000000000001748.json.9d2edecf-973c-4d61-a178-4db46bd70f2c.tmp Operation CREATE failed with exception java.net.SocketTimeoutException : Read timed out Last encountered exception thrown after 1 tries. [java.net.SocketTimeoutException] [ServerRequestId:null] at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169) at com.microsoft.azure.datalake.store.ADLStoreClient.createFile(ADLStoreClient.java:281) at org.apache.hadoop.fs.adl.AdlFileSystem.create(AdlFileSystem.java:374) at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1228) at org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:100) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:605) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:696) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:692)

sudhikul commented 3 years ago

Hi All,

This is just a gentle reminder!!!

Can anyone look into this issue and provide your inputs on how to fix this issue at the earliest?

Thanks and Regards, Sudhindra

rahuldutta90 commented 3 years ago

@sudhikul Apologies for the delay. what is the version of adls java sdk you are using? You can find the jar name "azure-data-lake-store-*"? I recall there was a issue in older version of sdk.

Also I would recommend you to open a case through Azure portal for your gen1 account in that way you can provide detail more on the account etc.