elastic / elasticsearch-hadoop

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
https://www.elastic.co/products/hadoop
Apache License 2.0
9 stars 990 forks source link

[Feature] Spark 3.1.1 Support #1617

Closed johnc1231 closed 2 years ago

johnc1231 commented 3 years ago

What kind an issue is this?

Feature description

I'm wondering what the plan is for Spark 3.1.1 support. I've been able to build elasticsearch-spark-30 for Spark 3.1.1 locally with only a minor tweak to the current code (seems like compactLogs method is gone in 3.1.1, so needed a change here. compactLogs was removed here )

koertkuipers commented 3 years ago

did you get test/check to pass?

spark switched from javax.servlet to jakarta.servlet for servlet-api and this seems to have broken tests for me. i get:

java.lang.NoClassDefFoundError: javax/servlet/http/HttpSessionIdListener
at org.sparkproject.jetty.server.handler.ContextHandler.<clinit>(ContextHandler.java:121) 
johnc1231 commented 3 years ago

Good point, I haven't run all the tests. I just swapped it in for my use case and was able to use it successfully.

I think it wouldn't be too hard to handle the changes, but I'm wondering how this project handles backwards incompatible changes in Spark. Do you just accept a PR to get Spark 3.1.1 working, at the expense of Spark 3.0.0, or do we have to maintain both?

jbaiera commented 3 years ago

I ran into these two problems while getting the Spark 3.0 support PR finished recently. I aired on the side of not addressing the backwards compatibility problem at the time in order to prioritize getting 3.0 support out the door.

We might be able to get around the backwards compatibility requirements if 3.1.1 support goes out in v8.0.0 only, but there's no guarantee of when 8.0 will fully land as GA. It's likely to be quite a ways off and I'd like to avoid (any more) long delays to version support. Spark 3.1.1 support should probably land in a 7.x release, which means it needs to be backwards compatible with 3.0.0. Them's the brakes unfortunately, but I don't think it's impossible to do.

pan3793 commented 3 years ago

Have any plan to support DataSourceV2?

jbaiera commented 3 years ago

@pan3793 I think that would be awesome to support, but I've had a hard time finding documentation for the API's and haven't had much time to dig into the source for them yet. Definitely something we'd like to tackle going forward.

pan3793 commented 3 years ago

@jbaiera Hope this helps. https://jaceklaskowski.github.io/mastering-spark-sql-book/new-and-noteworthy/datasource-v2/

masseyke commented 2 years ago

Closing this because support for spark 3.1 was added in #1807. We've got #1801 to track support for DataSourceV2.