crealytics / spark-excel

A Spark plugin for reading and writing Excel files
Apache License 2.0
458 stars 147 forks source link

Reading excel file in Azure Databricks #467

Open grajee-everest opened 2 years ago

grajee-everest commented 2 years ago

I'm tried to use spark-excel in Azure Databricks but I seem to be be running into an error. I earlier tried the same using SQLServer Big Data Cluster but I was unable to.

Current Behavior I'm getting an error java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B

image

I loaded first the Maven Coordinates and got the error. I later followed the link and loaded the jar files and yet got the same error as shown in the screenshot.

image

Steps to Reproduce (for bugs)

df = spark.read.format("excel") \
   .option("header", True) \
   .option("inferSchema", True) \
   .load(f"dbfs:/FileStore/tables/users.xls") \
   .withColumn("file_name", input_file_name())

Your Environment

Azure Databricks image

hilalarrasyid commented 2 years ago

Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.

This worked for me too. Thanks man. cluster spec ---> DBR 10.4 LTS | Spark 3.2.1 | Scala 2.12

akshaysangma commented 1 year ago

Issue seems to be resolved with com.crealytics:spark-excel_2.12:3.2.1_0.17.1

Cluster specs: Apache Spark 3.2.1 Scala 2.12

nightscape commented 1 year ago

@akshaysangma does it work with newer versions as well?

Wang-23 commented 1 year ago

Issue seems to be resolved with com.crealytics:spark-excel_2.12:3.2.1_0.17.1

Cluster specs: Apache Spark 3.2.1 Scala 2.12

I'm so appreciative of your support

Wang-23 commented 1 year ago

com.crealytics:spark-excel_2.12:3.2.1_0.17.1 this version may resolve the issue

hprasad-tls commented 8 months ago

I'm also facing same issue in #GCP #Databricks. my requirement read XLS format file through #PySpark

I have tried all below Jar version, but no of them seems to work

  1. com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1
  2. com.crealytics:spark-excel_2.12:3.5.0_0.20.3
  3. com.crealytics:spark-excel_2.12:3.2.4_0.20.3
  4. com.crealytics:spark-excel_2.12:3.2.4_0.19.0

Bing Chat or Copilot suggested below solution. image

hprasad-tls commented 8 months ago

I'm able to find the issue with my XLS file, the file was export from a website in HTML format with XLS extension. Due to which I was able to open it in MS Excel but unable to parse it in this library, due to which it's throwing this error.

saikumarveera commented 5 months ago

Is this issue resolved? and also is there any failfast mode like CSV.. which supports?

saikumarveera commented 5 months ago

Is this issue resolved? and also is there any failfast mode like CSV.. which supports?