Open fnothaft opened 8 years ago
Oops, that was premature. It didn't actually work it builds alright, but I still get the error.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] gnocchi: Genotype store and query engine ........... SUCCESS [ 0.612 s]
[INFO] gnocchi-core: Core APIs and queries ................ SUCCESS [01:22 min]
[INFO] gnocchi-cli: Command line interface for managing and querying variant store SUCCESS [ 18.025 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:41 min
[INFO] Finished at: 2016-07-15T14:16:30-07:00
[INFO] Final Memory: 36M/994M
[INFO] ------------------------------------------------------------------------
Taners-MacBook-Pro:gnocchi Taner$ ./bin/gnocchi-submit regressPhenotypes testData/sample.vcf testData/samplePhenotypes.csv testData/associations -saveAsText
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
2016-07-15 14:47:51 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Command body threw exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
org.apache.spark.serializer.KryoSerializerInstance.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
org.apache.spark.serializer.KryoSerializerInstance.
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1016)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:861)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$.mergeSchemasInParallel(ParquetRelation.scala:799)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$MetadataCache$$readSchema(ParquetRelation.scala:517)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.refresh(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache$lzycompute(ParquetRelation.scala:145)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache(ParquetRelation.scala:143)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.dataSchema(ParquetRelation.scala:202)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at net.fnothaft.gnocchi.cli.RegressPhenotypes.run(RegressPhenotypes.scala:79)
at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:54)
at net.fnothaft.gnocchi.cli.RegressPhenotypes.run(RegressPhenotypes.scala:70)
at net.fnothaft.gnocchi.cli.GnocchiMain$.main(GnocchiMain.scala:48)
at net.fnothaft.gnocchi.cli.GnocchiMain.main(GnocchiMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
at org.apache.spark.serializer.KryoSerializerInstance.
@tkdagdelen can you do:
git log
And send the output of:
cat bin/gnocchi-submit
Taners-MacBook-Pro:gnocchi Taner$ git log commit 5f5a957a4d926fe929bf7ec6c795cf08a12840fe Author: Frank Austin Nothaft fnothaft@alumni.stanford.edu Date: Tue Dec 8 10:43:19 2015 -0800
[gnocchi-19] Add PCA.
This commit adds code to run PCA on genotypes. This code transposes the genotype
matrix and then uses SVD to compute the PCA rotations. This allows us to process
all of the data, instead of filtering down to a genomic range and then running
PCA. Additionally, I have factored the matrix creation code out of the
SampleSimilarity code. Resolves #19.
commit eb76275eed01b784d735b0f83511e10411e124ee Author: Frank Austin Nothaft fnothaft@alumni.stanford.edu Date: Sat Dec 5 17:40:15 2015 -0800
[gnocchi-18] Port Gnocchi over to the Spark SQL DataFrame/Dataset API
Because Spark SQL presents better opportunities for optimization in the future,
I have moved Gnocchi over to Spark SQL. Several functions are not working
correctly right now; there seems to be an issue with Broadcast JOINs that
crops up when running regressPhenotypes. I plan to resolve that before merging
this work. Resolves #18.
commit 66a015e77b2d912fc0a7de468d5ca65be17f7106 Merge: 3e84043 68f3915 Author: Frank Austin Nothaft fnothaft@alumni.stanford.edu Date: Fri Jun 17 10:06:38 2016 -0700
Merge pull request #20 from fnothaft/pom-upgrades
Miscellaneous pom.xml upgrades.
commit 68f391567308bdee9f6756efafbec9c6ed3cd9e8 Author: Frank Austin Nothaft fnothaft@alumni.stanford.edu Date: Fri Jun 17 09:49:51 2016 -0700
Miscellaneous pom.xml upgrades.
Taners-MacBook-Pro:gnocchi Taner$ cat bin/gnocchi-submit
#
#
#
#
set -e
DD=False # DD is "double dash" PRE_DD=() POST_DD=() for ARG in "$@"; do shift if [[ $ARG == "--" ]]; then DD=True POST_DD=( "$@" ) break fi PRE_DD+=("$ARG") done
if [[ $DD == True ]]; then SPARK_ARGS="${PRE_DD[@]}" GNOCCHI_ARGS="${POST_DD[@]}" else SPARK_ARGS=() GNOCCHI_ARGS="${PRE_DD[@]}" fi
if [[ $DD == False && -n "$GNOCCHI_OPTS" ]]; then
echo "WARNING: Passing Spark arguments via GNOCCHI_OPTS was recently removed."
echo "Run gnocchi-submit instead as gnocchi-submit
SCRIPT_DIR="$(cd dirname $0
/..; pwd)"
GNOCCHI_JARS=$("$SCRIPT_DIR"/bin/compute-gnocchi-jars.sh)
GNOCCHI_CLI_JAR=${GNOCCHI_JARS##*,} GNOCCHI_JARS=$(echo "$GNOCCHI_JARS" | rev | cut -d',' -f2- | rev)
SPARK_ARGS=$("$SCRIPT_DIR"/bin/append_to_option.py , --jars $GNOCCHI_JARS $SPARK_ARGS)
if [ -z "$SPARK_HOME" ]; then SPARK_SUBMIT=$(which spark-submit) else SPARK_SUBMIT="$SPARK_HOME"/bin/spark-submit fi if [ -z "$SPARK_SUBMIT" ]; then echo "SPARK_HOME not set and spark-submit not on PATH; Aborting." exit 1 fi echo "Using SPARK_SUBMIT=$SPARK_SUBMIT"
"$SPARK_SUBMIT" \ --class net.fnothaft.gnocchi.cli.GnocchiMain \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.kryo.registrator=net.fnothaft.gnocchi.GnocchiKryoRegistrator \ $SPARK_ARGS \ $GNOCCHI_CLI_JAR \ $GNOCCHI_ARGS
Can you run:
git fetch origin
git checkout -b issues/23-fix-registrator origin/issues/23-fix-registrator
And then let me know if you still see the issue?
fatal: Cannot update paths and switch to branch 'issues/23-fix-registrator' at the same time. Did you intend to checkout 'origin/issues/23-fix-registrator' which can not be resolved as commit?
Can you do:
git remote show origin
git branch -v
Taners-MacBook-Pro:gnocchi Taner$ git remote show origin
Taners-MacBook-Pro:gnocchi Taner$ git branch -v
Ah, OK!
Do this instead:
git remote add upstream git@github.com:fnothaft/gnocchi.git
git fetch upstream
git checkout -b issues/23-fix-registrator upstream/issues/23-fix-registrator
Taners-MacBook-Pro:gnocchi Taner$ git remote add upstream git@github.com:fnothaft/gnocchi.git Taners-MacBook-Pro:gnocchi Taner$ git fetch upstream remote: Counting objects: 4, done. remote: Total 4 (delta 3), reused 3 (delta 3), pack-reused 1 Unpacking objects: 100% (4/4), done. From github.com:fnothaft/gnocchi
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
org.apache.spark.serializer.KryoSerializerInstance.
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1016)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:861)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$.mergeSchemasInParallel(ParquetRelation.scala:799)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$MetadataCache$$readSchema(ParquetRelation.scala:517)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.refresh(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache$lzycompute(ParquetRelation.scala:145)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache(ParquetRelation.scala:143)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.dataSchema(ParquetRelation.scala:202)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at net.fnothaft.gnocchi.cli.RegressPhenotypes.run(RegressPhenotypes.scala:79)
at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:54)
at net.fnothaft.gnocchi.cli.RegressPhenotypes.run(RegressPhenotypes.scala:70)
at net.fnothaft.gnocchi.cli.GnocchiMain$.main(GnocchiMain.scala:48)
at net.fnothaft.gnocchi.cli.GnocchiMain.main(GnocchiMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
at org.apache.spark.serializer.KryoSerializerInstance.
Taners-MacBook-Pro:gnocchi Taner$ ./bin/gnocchi-submit regressPhenotypes testData/sample.vcf testData/samplePhenotypes.csv testData/associations -saveAsText
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
2016-07-18 12:45:12 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Command body threw exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
org.apache.spark.serializer.KryoSerializerInstance.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: org.apache.spark.SparkException: Failed to register classes with Kryo
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
org.apache.spark.serializer.KryoSerializerInstance.
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1016)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:921)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:861)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$.mergeSchemasInParallel(ParquetRelation.scala:799)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$MetadataCache$$readSchema(ParquetRelation.scala:517)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.refresh(ParquetRelation.scala:421)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache$lzycompute(ParquetRelation.scala:145)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache(ParquetRelation.scala:143)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.dataSchema(ParquetRelation.scala:202)
at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636)
at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at net.fnothaft.gnocchi.cli.RegressPhenotypes.run(RegressPhenotypes.scala:79)
at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:54)
at net.fnothaft.gnocchi.cli.RegressPhenotypes.run(RegressPhenotypes.scala:70)
at net.fnothaft.gnocchi.cli.GnocchiMain$.main(GnocchiMain.scala:48)
at net.fnothaft.gnocchi.cli.GnocchiMain.main(GnocchiMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
at org.apache.spark.serializer.KryoSerializerInstance.
I think I found the problem. We seem to be missing the entire GnocchiKryoRegistrator class. It's not in the repo anymore....
https://github.com/fnothaft/gnocchi/blob/master/bin/gnocchi-submit#L79