BD2KGenomics / cgcloud

Image and VM management for Jenkins, Spark and Mesos clusters in EC2
Other
22 stars 17 forks source link

Move to Spark 1.6.2 and Java 8 (resolves #231) #233

Closed fnothaft closed 8 years ago

fnothaft commented 8 years ago

Hold on merge, needs to be tested.

jpdna commented 8 years ago

This branch not working for me yet.

I tried testing fnothaft:issues/231-spark-162-java8 by installing using "Developer" instructions make develop sdist after cloning this branch and to install cgcloud-spark running python setup.py install that worked it seemed, and I was able to create a test cluster of one machine.

But logging into master to test spark I find:

sparkbox@ip-172-31-37-40:~$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
sparkbox@ip-172-31-37-40:~$ spark-shell
/opt/sparkbox/spark/bin/spark-class: line 86: /usr/lib/jvm/java-7-oracle/bin/java: No such file or directory

I'm also suspicious of reference to java-7 here: https://github.com/fnothaft/cgcloud/blob/issues/231-spark-162-java8/spark/src/cgcloud/spark/spark_box.py#L198

It's possible that I am not installing locally correctly from the branch - let me know if I should try again a different way if this is working on your end @fnothaft .

fnothaft commented 8 years ago

Ah! That is my mistake, will fix in AM.

jpdna commented 8 years ago

Ping on this, later this week/weekend I'd like to make use of this PR. If you think changes are just a few more s/java-7/java-8/ I will look into if you can't get to this, just let me know.

jpdna commented 8 years ago

@fnothaft - do you want me to look into this further if you are swamped? Do you think it is just a matter of more s/java-7/java-8/ spark_box.py or any other pointers you would have?

fnothaft commented 8 years ago

Sorry, just fell off my radar. Give me 5min.

fnothaft commented 8 years ago

@jpdna just fixed and force pushed an amended commit. LMK if this works for you!

jpdna commented 8 years ago

I was able to launch a spark cluster with cgcloud which was indeed now spark 1.6.2 and java8, so this PR appears to work fine now.

fnothaft commented 8 years ago

Great! Thanks for the confirmation, @jpdna.

hannes-ucsc commented 8 years ago

Tests are failing with what seems to be a deterministic problem:

http://jenkins.cgcloud.info/job/cgcloud/390/testReport/src.cgcloud.spark.test.test_spark/SparkClusterTests/test_wordcount/

hannes-ucsc commented 8 years ago

Jenkins, test this please.

hannes-ucsc commented 8 years ago

I've just triggered another build to see if the test failure is deterministic. It looks like it is:

java.lang.IllegalArgumentException: System memory 64880640 must be at least 4.718592E8. Please use a larger heap size.
    at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:198)
    at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:180)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:745)
hannes-ucsc commented 8 years ago

Yep, happened again.

http://jenkins.cgcloud.info/job/cgcloud/400/testReport/src.cgcloud.spark.test.test_spark/SparkClusterTests/test_wordcount/

FWIW, 471859200 == 4.718592E8 == 450 MiB

jpdna commented 8 years ago

just pinging on this as its a blocker for docs work

hannes-ucsc commented 8 years ago

I would just remove the passing of --executor-memory from the word count test. The defaults should work fine. Looking at

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala#L210

fnothaft commented 8 years ago

@hannes-ucsc this passes now. Thanks for catching the low driver/executor memory settings!

hannes-ucsc commented 8 years ago

Thank you!

hannes-ucsc commented 8 years ago

The master build of the merge commit failed due to a pip outage. I just triggered it again.