ActianCorp / spark-vector

Repository for the Spark-Vector connector
Apache License 2.0
20 stars 9 forks source link

QA - using spark connector to unload tables uses all memory and never finishes for big tables #1

Closed Pyrobal closed 8 years ago

Pyrobal commented 8 years ago

Already confirmed by Andrei. Jdbc unload should not try to take all memory. It always seem to fail with:

testcase: took 7gb lineitem table, loaded it. Tried to unload into parquet file. Always hangs no matter which memory settings I give to the Shell.

spark-shell --master yarn --conf "spark.executor.memory=8G" --conf "spark.driver.memory=8G" --jars /home/actian/kelch01/Spark/spark-loader/spark_vector_loader-assembly-1.0-SNAPSHOT.jar

sqlContext.sql("""CREATE TEMPORARY TABLE lineitem USING com.actian.spark_vector.sql.DefaultSource OPTIONS ( host "uksl-kelch01-cent6-clu1", instance "K2", database "testdb", table "lineitem" )""")

sqlContext.sql("select * from lineitem").write.parquet("hdfs://uksl-kelch01-cent6-clu1.actian.com:8020/Actian/tmp/lineitem.parquet")

[ERROR] [02/25/2016 15:15:48.155] [sparkDriver-scheduler-1] [ActorSystem(sparkDriver)] exception on LARS’ timer thread java.lang.OutOfMemoryError: GC overhead limit exceeded

[ERROR] [02/25/2016 15:16:26.251] [sparkDriver-scheduler-1] [ActorSystem(sparkDriver)] Uncaught fatal error from thread [sparkDriver-scheduler-1] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: GC overhead limit exceeded

java.lang.OutOfMemoryError: GC overhead limit exceeded