GELOG / adamcloud

Portable cloud infrastructure for a genomic transformation pipeline using Adam
2 stars 0 forks source link

java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset() #19

Closed davidonlaptop closed 9 years ago

davidonlaptop commented 9 years ago

Problem

If one of the process in the pipeline (Adam, Spark Driver / Executor, etc.) does not have enough memory, multiple errors may occur:

The following error occurs when the command is ran from within the ADAM docker container:

root@c6c30dad4a13:/# SPARK_DRIVER_MEMORY=1g SPARK_EXECUTOR_MEMORY=1g adam-submit transform /data/1kg/samples/hg00096/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam /data/adamcloud/hg00096.chrom20.adam
Spark assembly has been built with Hive, including Datanucleus jars on classpath
2015-05-31 05:49:00 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-05-31 05:49:14 WARN  ThreadLocalRandom:136 - Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2015-05-31 05:51:48 ERROR Executor:96 - Exception in task 3.0 in stage 0.0 (TID 3)
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
    at parquet.Preconditions.checkArgument(Preconditions.java:47)
    at parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
    at parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:68)
    at parquet.column.impl.ColumnWriterImpl.writePage(ColumnWriterImpl.java:147)
    at parquet.column.impl.ColumnWriterImpl.flush(ColumnWriterImpl.java:236)
    at parquet.column.impl.ColumnWriteStoreImpl.flush(ColumnWriteStoreImpl.java:113)
    at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:153)
    at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
    at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:969)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2015-05-31 05:51:48 ERROR Executor:96 - Exception in task 6.0 in stage 0.0 (TID 6)
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
    at parquet.Preconditions.checkArgument(Preconditions.java:47)
    at parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
    at parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:68)
    at parquet.column.impl.ColumnWriterImpl.writePage(ColumnWriterImpl.java:147)
    at parquet.column.impl.ColumnWriterImpl.flush(ColumnWriterImpl.java:236)
    at parquet.column.impl.ColumnWriteStoreImpl.flush(ColumnWriteStoreImpl.java:113)
    at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:153)
    at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
    at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:969)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
...
davidonlaptop commented 9 years ago

Potential related problems:

davidonlaptop commented 9 years ago

Solution

Allocate more memory for Spark by using the SPARK_DRIVER_MEMORY and SPARK_EXECUTOR_MEMORY variables

Start an ADAM container:

$ docker run --rm -ti -v /Users/david/data:/data gelog/adam bash
root@42c257dcfbcc:/#

Then, run ADAM with 1.5GB of RAM:

root@42c257dcfbcc:/# SPARK_DRIVER_MEMORY=1500m SPARK_EXECUTOR_MEMORY=1500m adam-submit transform /data/1kg/samples/hg00096/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam /data/adamcloud/hg00096.chrom20.adam
Spark assembly has been built with Hive, including Datanucleus jars on classpath
2015-06-01 20:36:24 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-06-01 20:36:33 WARN  ThreadLocalRandom:136 - Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
root@42c257dcfbcc:/#