bkvarda / kudusparklyr

A Kudu extension for Sparklyr
11 stars 5 forks source link

Add support for Spark 2.2 #4

Closed markderry closed 7 years ago

markderry commented 7 years ago

We would like to add support for spark 2.2 and I believe that we would need to update the .jar file.

bkvarda commented 7 years ago

Just saw this - I think I should be able to take care of it tonight.

markderry commented 7 years ago

Awesome! We just upgraded and I have had to revert to spark 1.6 to get things to run and we have been having all kinds of troubles!

bkvarda commented 7 years ago

Okay got it - did you just upgrade Spark or CDH/Kudu also?

markderry commented 7 years ago

We just upgraded to kudu 1.4.0-cdh5.12.1

bkvarda commented 7 years ago

Pushed changes. You will have to set an option before creating your spark connection to specify the kudu version (this is how it will select which JARs to use). Let me know if this works and if not, send me any errors so I can repro - I wasn't able to test extensively.

options(kudu.version="1.4.0") sc <- spark_connect(master="yarn-client",version="2.2", config = list(default = list(spark.yarn.keytab="/home/ec2-user/navi.keytab",spark.yarn.principal="navi@CLOUDERA.INTERNAL")))

bkvarda commented 7 years ago

Mark - did this fix the issue(s)?

markderry commented 7 years ago

Somewhat, it failed when I went to add the kudu context to the connection and said that I needed to set spark multiple contexts to true and I have been trying to find where I would do that.

krb16704 commented 7 years ago

I've been working with mark on this one. We have solved that issue, but we are running into another one about the create kudu table method not found. I opened another issue #5 to reference it specifically

bkvarda commented 7 years ago

I think this and #5 may be related. I will try doing some extensive testing to figure out the cause over the next week or so.

bkvarda commented 7 years ago

@krb16704 This should be fixed now. The issue was that in the Scala Java/Kudu library that this depends on, Spark 1.6 was deprecated, and now that code uses the existing SparkContext instead of creating a new one as shown below: ` @InterfaceStability.Unstable class KuduContext(val kuduMaster: String, sc: SparkContext) extends Serializable { import kudu.KuduContext._

@Deprecated() def this(kuduMaster: String) { this(kuduMaster, new SparkContext()) } ` So the error message was accurate - there were 2 contexts being created, but that should not be the case any longer. Let me know if this fixes the issue.