arangodb / arangodb-spark-datasource

ArangoDB Connector for Apache Spark, using the Spark DataSource API
Apache License 2.0
14 stars 11 forks source link

add support for spark 3.2.x #14

Closed reynoldsm88 closed 2 years ago

reynoldsm88 commented 2 years ago

@rashtao i've been looking into this. i would like to add a profile for spark 3.2. it is taking me some time to grok the build, but i'm not sure what to do re: different spark versions.

from my investigation so far, there are some interface changes in org.apache.spark.sql.catalyst.util.DateFormatter that need to be addressed in org.apache.spark.sql.arangodb.datasource.mapping.json.JacksonGenerator. i am willing to work on this, but just would like you to help me get the build setup the way you want to support spark 3.2

rashtao commented 2 years ago

I haven't looked into Spark 3.2 support yet, so I cannot help here. The support for Spark 3.2 is anyways something that we have in our roadmap.

reynoldsm88 commented 2 years ago

@rashtao how what would your suggestion be if i wanted to start experimenting with adding support. i'm not sure how you want to support cross spark versions. would you update the arangodb-spark-datasource-3.1 project to support 3.2? Or would you prefer to duplicate the 3.1 code and make the necessary changes in a new arangodb-spark-datasource-3.2 directory?

rashtao commented 2 years ago

To be safe, I think it would be better having a new arangodb-spark-datasource-3.2 module. The duplicated code can be moved to the arangodb-spark-commons module.

reynoldsm88 commented 2 years ago

@rashtao awesome thanks, just needed that direction. i can take a crack at this later this week.

reynoldsm88 commented 2 years ago

@rashtao because the 2.x and 3.x libraries are not source compatible, it makes it very difficult to pull things into arango-spark-commons... I believe what this project would need to move towards would be something akin to having version specific source trees as described here.

Any thoughts, I haven't looked into it much yet outside of research.

rashtao commented 2 years ago

For the moment feel free to keep source incompatible code in version specific modules (arangodb-spark-datasource-3.2).

iFedix commented 2 years ago

Hello! Any idea on the ETA of arangodb-spark-datasource-3.2 connector or anyway a way to bypass the error given if using arangodb-spark-datasource-3.1 but having spark 3.2.0 in production env? I get an error related to the org.apache.spark.sql.catalyst.util.DateFormatter and a missing method there..

rashtao commented 2 years ago

Closing in favor of https://github.com/arangodb/arangodb-spark-datasource/pull/31