elastacloud / spark-excel

A Spark data source for reading Microsoft Excel files
https://www.elastacloud.com
Apache License 2.0
13 stars 5 forks source link

CONCAT function not supported #14

Closed tahls closed 2 years ago

tahls commented 2 years ago

Hi @dazfuller,

I've been playing around with this project and have been loving it so far! However I've just attempted to read an Excel that uses the CONCAT function and it is throwing a parsing error for me (the specific error is Caused by: org.apache.poi.ss.formula.eval.NotImplementedFunctionException: _xlfn.CONCAT).

Looking through some other issues I came across this issue stating that CONCAT is now available with a newer version of the POI library: https://stackoverflow.com/a/66630395

Is it possible to get the POI library updated in a new version of the .jar?

dazfuller commented 2 years ago

It's certainly something I've been wanting to look at. I did try a few months ago and ran into some issues with newer versions, but it's worth revisiting.

dazfuller commented 2 years ago

ClassNotFound error in Azure Synapse for Spark pools for org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream turns out to be an issue with Synapse using a really old version of the Apache commons-io library (2.5.0 in Synapse, 2.8.0 in Databricks). The POI library has a dependency on the class which was introduced in version 2.7.0. So need to shade and include the commons-io library in the uber jar.

Worth noting that 2.5.0 is still open to 2 known CVEs

dazfuller commented 2 years ago

Closing as this is now resolved as a result of PR #16