Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data
Apache License 2.0
472 stars 59 forks source link

Update Spark to 2.4.0 #105

Closed rdblue closed 5 years ago

rdblue commented 5 years ago

This updates the Spark dependency to 2.4.0.

Changes include:

This fixes #94.

rdblue commented 5 years ago

FYI @omalley: Spark 2.4 broke ORC support, so we either need to fix it or remove it from Spark in the master branch. What would you prefer to do?

charleso commented 5 years ago

@rdblue Sorry to comment from the side-lines. I'm curious what your thoughts are for supporting multiple versions of spark simultaneously? I ask as a potential consumer of iceberg that has customers on different versions of spark. We currently have a compatibility spark 2.2/2.3 managed sources directory which enables us to maintain them from a single branch/multiple builds. I can understand that it might become extremely painful very quickly. I just wanted to check whether it's something you might have thought about?

rdblue commented 5 years ago

@charleso, we want to support multiple versions of Spark at the same time, but right now the API that we're using to integrate is evolving rapidly. It just doesn't make sense right now to support them all. Eventually, we will either use reflection or have versioned Spark modules.

charleso commented 5 years ago

@rdblue Thanks for that, exactly what I wanted to know. 👍

rdblue commented 5 years ago

Closing this. The new one is https://github.com/apache/incubator-iceberg/pull/30.