Open nickholz opened 6 years ago
Could you give me the arguments you used to start Halvade?
@ddcap did you figure this out? I'm having the same issue.
Not really, since I didn't have all the information to figure things out. Could you send me the command you used (i.e. which arguments) and the logs of the failed task. That should help me narrow down the problem.
I used the exact workflow at https://github.com/biointec/halvade/wiki/Recipe:-RNA-seq-with-Halvade-on-a-local-Hadoop-cluster
Here is the halvade.stderr file.
Here is my example configuration.
#----------------------------------
# required Halvade arguments
#----------------------------------
N=1
M=128
C=24
B="/home/shutchins2/halvade/bin.tar.gz"
D="/home/shutchins2/halvade/ref/dbsnp/dbsnp_138.hg19.vcf"
R="/home/shutchins2/halvade/ref/ucsc.hg19.fasta"
S="/home/shutchins2/halvade/ref/STAR_ref"
I="/home/shutchins2/halvade/in/"
O="/home/shutchins2/halvade/out3/"
smt
rna
That text file was ugly...
18/05/31 14:38:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2018/05/31 14:38:24 - DEBUG] reference file not found on a distributed fs (HDFS)
[2018/05/31 14:38:24 - DEBUG] reference file is on local disk
[2018/05/31 14:38:24 - DEBUG] requires star genome 2 upload? false
[2018/05/31 14:38:24 - DEBUG] pass 2 UID: 310518023824.895
[2018/05/31 14:38:24 - DEBUG] All reference files are available
[2018/05/31 14:38:24 - DEBUG] pmem check disabled, using less memory for STAR because of shared memory
[2018/05/31 14:38:24 - DEBUG] set # map containers: 21
[2018/05/31 14:38:24 - DEBUG] resources set to 21 maps [2 cpu , 6144 mb] per node and 2 reducers [11 cpu, 32768 mb] per node
[2018/05/31 14:38:24 - DEBUG] parsing dictionary /home/shutchins2/halvade/ref/ucsc.hg19.dict
[2018/05/31 14:38:24 - DEBUG] requested # reducers: 36
[2018/05/31 14:38:25 - DEBUG] final # reducers: 35
[2018/05/31 14:38:25 - DEBUG] Started Halvade pass 1 Job
18/05/31 14:38:25 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/05/31 14:38:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/05/31 14:38:25 INFO input.FileInputFormat: Total input files to process : 8
18/05/31 14:38:25 INFO mapreduce.JobSubmitter: number of splits:8
18/05/31 14:38:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local406084068_0001
18/05/31 14:38:28 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:9000/home/shutchins2/halvade/bin.tar.gz as file:/tmp/hadoop-shutchins2/mapred/local/1527795505390/bin.tar.gz
18/05/31 14:38:28 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/05/31 14:38:28 INFO mapreduce.Job: Running job: job_local406084068_0001
18/05/31 14:38:28 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/05/31 14:38:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:28 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:28 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Waiting for map tasks
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000000_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_0_0.fq.gz:0+28657878
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000000_0
[2018/05/31 14:38:29 - DEBUG] task = 0
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] STAR instance type: 1
18/05/31 14:38:29 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
[2018/05/31 14:38:29 - DEBUG] containers left: 8
[2018/05/31 14:38:29 - DEBUG] paired? true
[2018/05/31 14:38:29 - DEBUG] ref: /home/shutchins2/halvade/ref/STAR_ref/
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000001_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_6_0.fq.gz:0+28656471
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000001_0
[2018/05/31 14:38:29 - DEBUG] task = 1
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000002_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_3_0.fq.gz:0+28630703
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000002_0
[2018/05/31 14:38:29 - DEBUG] task = 2
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000003_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_7_0.fq.gz:0+28601002
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000003_0
[2018/05/31 14:38:29 - DEBUG] task = 3
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000004_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_5_0.fq.gz:0+28324989
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000004_0
[2018/05/31 14:38:29 - DEBUG] task = 4
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000005_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_4_0.fq.gz:0+27663011
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000005_0
[2018/05/31 14:38:29 - DEBUG] task = 5
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000006_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_2_0.fq.gz:0+27636390
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000006_0
[2018/05/31 14:38:29 - DEBUG] task = 6
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000007_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_1_0.fq.gz:0+27600220
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000007_0
[2018/05/31 14:38:29 - DEBUG] task = 7
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: map task executor complete.
18/05/31 14:38:29 WARN mapred.LocalJobRunner: job_local406084068_0001
java.lang.Exception: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551)
Caused by: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1
at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:132)
at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/05/31 14:38:29 INFO mapreduce.Job: Job job_local406084068_0001 running in uber mode : false
18/05/31 14:38:29 INFO mapreduce.Job: map 0% reduce 0%
18/05/31 14:38:29 INFO mapreduce.Job: Job job_local406084068_0001 failed with state FAILED due to: NA
18/05/31 14:38:29 INFO mapreduce.Job: Counters: 0
[2018/05/31 14:38:29 - DEBUG] Finished Halvade pass 1 Job [runtime: 4s 923ms ]
[2018/05/31 14:38:29 - DEBUG] Halvade pass 1 job failed.
Ye thats fine, I'm trying to test it now. If you can find them can you find the stderr of the individual map tasks? They should show more information as to whats wrong.
I'll look around for it. I also get this error? Seems the directory is "creating" at an odd location...
[INFO] The output directory '/home/shutchins2/halvade/out//pass1' already exists.
I think that is caused by the first error, it creates a directory pass1 to store the information of the first pass of STAR alignment, which is then used in the second step. But since the first job failed but created the directory now Hadoop MapReduce detects the directory and stops the job. If you delete the /home/shutchins2/halvade/out directory this error should go away.
Currently having some hadoop issues on my server, but once I figure that out, I'll try to recreate this.
@ddcap what kind of file should I be looking for? So far I can't locate that stderr file of the map tasks at least not in a human readable format. I figured out the hadoop issues I was having and it's unrelated to this.
I assume these are the tasks...
I'm thinking it's something simple...
perhaps the below function? I don't have much experience in java, but it looks like when making the bin dir NULL
, it improperly sets the location as [2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
That's just my guess.
Right after this it checks if binDir is not null and throws an expection if it is, so thats not it. Could you send the stderr from one of the tasks you found?
I don't get any errors when running the RNA pipeline
@ddcap could I see the configuration you have?
I am trying the exact setup from the link.
This is the command I used: hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes 1--mem 128--vcores 20 -D $D -R $R
The new version of Halvade detects a bin.tar.gz in the same folder you are running from so thats why I don't use the -B option.
@ddcap so I shouldn't use the python file?
It should give the same result but you can try.
@ddcap where do I run the command from?
hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes 1--mem 128--vcores 20 -D $D -R $R
Do I need to export those variables to my environment?
I have a script like this:
#!/bin/bash
I="/path/to/input"
O="/path/to/output/"
mem=128
vcores=20
D="/path/to/ref.vcf"
R="/path/to/ref.fasta"
STAR="/path/to/STAR_ref/"
jar=/path/to/HalvadeWithLibs.jar
hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes $nodes --mem $mem --vcores $vcores -D $D -R $R
Change the paths to all files and you can test it
I just tested that out...Perhaps there is an issue with how I am using hadoop or my java?
Like I said, the bin.tar.gz should be in the folder you run the script from, or you add the -B $B
option where
B=/path/to/bin.tar.gz
According to the HDFS, it is in that directory.
I mean the bin.tar.gz should be where the script is run, not on HDFS
No binaries archive found, please provide the location with the -B argument or make sure it is present in the current directory: /home/shutchins2/halvade
put: `/home/shutchins2/halvade/bin.tar.gz': File exists
It's also located where I'm running the script.
It shouldn't be on hdfs put
is a hdfs command.
try the option -B?
Also doesn't work with the -B option.
If it's not on the hdfs, it throws a file does not exist error
. Which version of hadoop are you using? I think that could very well be the issue.
Halvade doesn't look for that file on hdfs, it tries to find the binary archive on your local disk. Either in the directory you ran halvade from or if you provide the option it tries to locate it there. It doesn't matter if its on hdfs or not.
Maybe you don't have read access there? Some intallations of Hadoop use the yarn user to run Hadoop jobs.
I tried with Hadoop 2.2.x, 2.4.x 2.6.x and 2.7.x, so I doubt thats the issue.
I should have read access. This is a local install of Hadoop version 2.9.1.
The binary file is in both locations. I'm not sure what else to try other than a different version of hadoop.
If this doesn't work, however, I'd still like to be able to use this on a cluster. Would you recommend using halvade in a similar instance with something like pbs?
You can continue with the python script since that seemed to work for you. can you provide the stderr of the failed task, it should give me more info why you had the first error.
You will need hadoop on a pbs system, there are options like hadoopondemand which deploys a hadoop cluster on the pbs job you requested.
What about a cluster without a manager? Both the bash command and python script are yielding the exact same results and error codes.
#!/bin/bash
I="/home/shutchins2/halvade/in/"
O="/home/shutchins2/halvade/out/"
mem=128
vcores=24
D="/home/shutchins2/halvade/ref/dbsnp/dbsnp_138.hg19.vcf"
R="/home/shutchins2/halvade/ref/ucsc.hg19.fasta"
STAR="/home/shutchins2/halvade/ref/STAR_ref"
jar="/home/shutchins2/halvade/HalvadeWithLibs.jar"
N=1
B="/home/shutchins2/halvade/bin.tar.gz"
hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes $N --mem $mem --vcores $vcores -D $D -R $R -B $B
That gives me the same error.
The failed task has 4 ansi characters
ÿÿÿÿ
As long as you have hadoop mapreduce you can run halvade. So now the python script also doesnt find the bin file? What changed from before then?
@ddcap It was always the same output. I wanted to try your setup to make sure it wasn't the python file.
But now the hadoop job doesn't even start, before you had tasks that failed but they were started, now thy dont even start right?
No, they're doing the same thing as before.
Well before you had this error: cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1
While now you get this one: No binaries archive found
Right or am I missing something??
They are not the same error. In the first hadoop has started jobs and tries to start STAR, while the second it cant find the bin.tar.gz so doesn't even start the job.
Well before you had this error: cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1 While now you get this one: No binaries archive found Right or am I missing something?? They are not the same error. In the first hadoop has started jobs and tries to start STAR, while the second it cant find the bin.tar.gz so doesn't even start the job.
Sorry, I didn't clarify. That was before I added the -B
argument. Without it, I was getting the no binaries archive found
error.
Ok so you get tasks again, could you please send me the stderr/stdout of one of the tasks that failed?
The file output?
ÿÿÿÿ
The other error is the same as above at https://github.com/biointec/halvade/issues/12#issuecomment-393656917
is this stdout or stderr? stderr should show which files are in the bin.tar.gz and which programs have started and the error logs of those programs. Also do you use yarn to start mapreduce?
That is the stdout from the hadoop tasks. The stderr I get from running halvade with python or the bash script is at https://github.com/biointec/halvade/issues/12#issuecomment-393656917.
I simply set up hadoop based on your workflow. Nothing extra.
This is what the tmp/halvade
folder contains:
There is nothing in the folder. The files have nothing in them as well.
Do you have access to a different Hadoop cluster to test it?
Unfortunately, I don't. @ddcap
I can't seem to reproduce this error, so I cannot really help any further without the log files, that strangely doesn't contain the correct information on your cluster. My best guess is that some configuration on the node/cluster isn't correct.
@ddcap would you be willing to give me your system information?
OS? Java version?
I have obtained the following error while trying to run the halvade RNA. Looks there is an issue when the program tries to run STAR from the bin.tar.gz file on the HDFS
[2017/11/29 21:12:26 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /tmp/halvade/m_000000_0-star1/, --genomeLoad, LoadAndExit] [EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory