biointec / halvade

Parallel read alignment and variant calling using MapReduce
GNU General Public License v3.0
19 stars 3 forks source link

Problem runnign STAR from bin.tar.gz #12

Open nickholz opened 6 years ago

nickholz commented 6 years ago

I have obtained the following error while trying to run the halvade RNA. Looks there is an issue when the program tries to run STAR from the bin.tar.gz file on the HDFS

[2017/11/29 21:12:26 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /tmp/halvade/m_000000_0-star1/, --genomeLoad, LoadAndExit] [EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory

ddcap commented 6 years ago

Could you give me the arguments you used to start Halvade?

sdhutchins commented 6 years ago

@ddcap did you figure this out? I'm having the same issue.

ddcap commented 6 years ago

Not really, since I didn't have all the information to figure things out. Could you send me the command you used (i.e. which arguments) and the logs of the failed task. That should help me narrow down the problem.

sdhutchins commented 6 years ago

I used the exact workflow at https://github.com/biointec/halvade/wiki/Recipe:-RNA-seq-with-Halvade-on-a-local-Hadoop-cluster

sdhutchins commented 6 years ago

Here is the halvade.stderr file.

Here is my example configuration.

#----------------------------------
# required Halvade arguments
#----------------------------------
N=1
M=128
C=24
B="/home/shutchins2/halvade/bin.tar.gz"
D="/home/shutchins2/halvade/ref/dbsnp/dbsnp_138.hg19.vcf"
R="/home/shutchins2/halvade/ref/ucsc.hg19.fasta"
S="/home/shutchins2/halvade/ref/STAR_ref"
I="/home/shutchins2/halvade/in/"
O="/home/shutchins2/halvade/out3/"
smt
rna
sdhutchins commented 6 years ago

That text file was ugly...

18/05/31 14:38:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[2018/05/31 14:38:24 - DEBUG] reference file not found on a distributed fs (HDFS)
[2018/05/31 14:38:24 - DEBUG] reference file is on local disk
[2018/05/31 14:38:24 - DEBUG] requires star genome 2 upload? false
[2018/05/31 14:38:24 - DEBUG] pass 2 UID: 310518023824.895
[2018/05/31 14:38:24 - DEBUG] All reference files are available
[2018/05/31 14:38:24 - DEBUG] pmem check disabled, using less memory for STAR because of shared memory
[2018/05/31 14:38:24 - DEBUG] set # map containers: 21
[2018/05/31 14:38:24 - DEBUG] resources set to 21 maps [2 cpu , 6144 mb] per node and 2 reducers [11 cpu, 32768 mb] per node
[2018/05/31 14:38:24 - DEBUG] parsing dictionary /home/shutchins2/halvade/ref/ucsc.hg19.dict
[2018/05/31 14:38:24 - DEBUG] requested # reducers: 36
[2018/05/31 14:38:25 - DEBUG] final # reducers: 35
[2018/05/31 14:38:25 - DEBUG] Started Halvade pass 1 Job
18/05/31 14:38:25 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/05/31 14:38:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/05/31 14:38:25 INFO input.FileInputFormat: Total input files to process : 8
18/05/31 14:38:25 INFO mapreduce.JobSubmitter: number of splits:8
18/05/31 14:38:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local406084068_0001
18/05/31 14:38:28 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:9000/home/shutchins2/halvade/bin.tar.gz as file:/tmp/hadoop-shutchins2/mapred/local/1527795505390/bin.tar.gz
18/05/31 14:38:28 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/05/31 14:38:28 INFO mapreduce.Job: Running job: job_local406084068_0001
18/05/31 14:38:28 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/05/31 14:38:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:28 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:28 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Waiting for map tasks
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000000_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_0_0.fq.gz:0+28657878
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000000_0
[2018/05/31 14:38:29 - DEBUG] task = 0
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] STAR instance type: 1
18/05/31 14:38:29 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
[2018/05/31 14:38:29 - DEBUG] containers left: 8
[2018/05/31 14:38:29 - DEBUG] paired? true
[2018/05/31 14:38:29 - DEBUG] ref: /home/shutchins2/halvade/ref/STAR_ref/
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000001_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_6_0.fq.gz:0+28656471
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000001_0
[2018/05/31 14:38:29 - DEBUG] task = 1
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000002_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_3_0.fq.gz:0+28630703
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000002_0
[2018/05/31 14:38:29 - DEBUG] task = 2
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000003_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_7_0.fq.gz:0+28601002
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000003_0
[2018/05/31 14:38:29 - DEBUG] task = 3
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000004_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_5_0.fq.gz:0+28324989
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000004_0
[2018/05/31 14:38:29 - DEBUG] task = 4
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000005_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_4_0.fq.gz:0+27663011
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000005_0
[2018/05/31 14:38:29 - DEBUG] task = 5
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000006_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_2_0.fq.gz:0+27636390
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000006_0
[2018/05/31 14:38:29 - DEBUG] task = 6
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: Starting task: attempt_local406084068_0001_m_000007_0
18/05/31 14:38:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/05/31 14:38:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/05/31 14:38:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/05/31 14:38:29 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/home/shutchins2/halvade/in/halvade_1_0.fq.gz:0+27600220
18/05/31 14:38:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/05/31 14:38:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/05/31 14:38:29 INFO mapred.MapTask: soft limit at 83886080
18/05/31 14:38:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/05/31 14:38:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/05/31 14:38:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/05/31 14:38:29 INFO compress.CodecPool: Got brand-new decompressor [.gz]
[2018/05/31 14:38:29 - DEBUG] taskId = attempt_local406084068_0001_m_000007_0
[2018/05/31 14:38:29 - DEBUG] task = 7
[2018/05/31 14:38:29 - DEBUG] file lock: /tmp/halvade/ load_sh_mem.lock
[2018/05/31 14:38:29 - DEBUG] Checking for binaries...
[2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin
[2018/05/31 14:38:29 - DEBUG] Started STAR
[2018/05/31 14:38:29 - DEBUG] Load ref [/home/shutchins2/halvade/ref/STAR_ref/] to shared memory
[2018/05/31 14:38:29 - DEBUG] running command [./bin.tar.gz/bin/STAR, --genomeDir, /home/shutchins2/halvade/ref/STAR_ref/, --genomeLoad, LoadAndExit]
[EXCEPTION] Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
java.io.IOException: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:116)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=20, Not a directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 13 more
18/05/31 14:38:29 INFO mapred.MapTask: Starting flush of map output
18/05/31 14:38:29 INFO mapred.LocalJobRunner: map task executor complete.
18/05/31 14:38:29 WARN mapred.LocalJobRunner: job_local406084068_0001
java.lang.Exception: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551)
Caused by: Cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1
    at be.ugent.intec.halvade.utils.ProcessBuilderWrapper.startProcess(ProcessBuilderWrapper.java:132)
    at be.ugent.intec.halvade.tools.STARInstance.loadSharedMemoryReference(STARInstance.java:248)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.loadReference(StarAlignPassXMapper.java:98)
    at be.ugent.intec.halvade.hadoop.mapreduce.StarAlignPassXMapper.setup(StarAlignPassXMapper.java:68)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
18/05/31 14:38:29 INFO mapreduce.Job: Job job_local406084068_0001 running in uber mode : false
18/05/31 14:38:29 INFO mapreduce.Job:  map 0% reduce 0%
18/05/31 14:38:29 INFO mapreduce.Job: Job job_local406084068_0001 failed with state FAILED due to: NA
18/05/31 14:38:29 INFO mapreduce.Job: Counters: 0
[2018/05/31 14:38:29 - DEBUG] Finished Halvade pass 1 Job [runtime: 4s 923ms ]
[2018/05/31 14:38:29 - DEBUG] Halvade pass 1 job failed.
ddcap commented 6 years ago

Ye thats fine, I'm trying to test it now. If you can find them can you find the stderr of the individual map tasks? They should show more information as to whats wrong.

sdhutchins commented 6 years ago

I'll look around for it. I also get this error? Seems the directory is "creating" at an odd location...

[INFO] The output directory '/home/shutchins2/halvade/out//pass1' already exists.
ddcap commented 6 years ago

I think that is caused by the first error, it creates a directory pass1 to store the information of the first pass of STAR alignment, which is then used in the second step. But since the first job failed but created the directory now Hadoop MapReduce detects the directory and stops the job. If you delete the /home/shutchins2/halvade/out directory this error should go away.

sdhutchins commented 6 years ago

Currently having some hadoop issues on my server, but once I figure that out, I'll try to recreate this.

sdhutchins commented 6 years ago

@ddcap what kind of file should I be looking for? So far I can't locate that stderr file of the map tasks at least not in a human readable format. I figured out the hadoop issues I was having and it's unrelated to this.

sdhutchins commented 6 years ago

I assume these are the tasks...

image

sdhutchins commented 6 years ago

I'm thinking it's something simple...

perhaps the below function? I don't have much experience in java, but it looks like when making the bin dir NULL, it improperly sets the location as [2018/05/31 14:38:29 - DEBUG] empty directory ./bin.tar.gz/bin

https://github.com/biointec/halvade/blob/271f5e1670d79612327b3916aa4415effa8f0941/halvade/src/be/ugent/intec/halvade/hadoop/mapreduce/HalvadeMapper.java#L72-L80

That's just my guess.

ddcap commented 6 years ago

Right after this it checks if binDir is not null and throws an expection if it is, so thats not it. Could you send the stderr from one of the tasks you found?

ddcap commented 6 years ago

I don't get any errors when running the RNA pipeline

sdhutchins commented 6 years ago

@ddcap could I see the configuration you have?

sdhutchins commented 6 years ago

I am trying the exact setup from the link.

ddcap commented 6 years ago

This is the command I used: hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes 1--mem 128--vcores 20 -D $D -R $R

The new version of Halvade detects a bin.tar.gz in the same folder you are running from so thats why I don't use the -B option.

sdhutchins commented 6 years ago

@ddcap so I shouldn't use the python file?

ddcap commented 6 years ago

It should give the same result but you can try.

sdhutchins commented 6 years ago

@ddcap where do I run the command from?

hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes 1--mem 128--vcores 20 -D $D -R $R

Do I need to export those variables to my environment?

ddcap commented 6 years ago

I have a script like this:

#!/bin/bash

I="/path/to/input"
O="/path/to/output/"
mem=128
vcores=20
D="/path/to/ref.vcf"
R="/path/to/ref.fasta"
STAR="/path/to/STAR_ref/"
jar=/path/to/HalvadeWithLibs.jar
hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes $nodes --mem $mem --vcores $vcores -D $D -R $R

Change the paths to all files and you can test it

sdhutchins commented 6 years ago

I just tested that out...Perhaps there is an issue with how I am using hadoop or my java?

image

ddcap commented 6 years ago

Like I said, the bin.tar.gz should be in the folder you run the script from, or you add the -B $B option where B=/path/to/bin.tar.gz

sdhutchins commented 6 years ago

According to the HDFS, it is in that directory.

ddcap commented 6 years ago

I mean the bin.tar.gz should be where the script is run, not on HDFS

sdhutchins commented 6 years ago
No binaries archive found, please provide the location with the -B argument or make sure it is present in the current directory: /home/shutchins2/halvade
put: `/home/shutchins2/halvade/bin.tar.gz': File exists
sdhutchins commented 6 years ago

It's also located where I'm running the script.

ddcap commented 6 years ago

It shouldn't be on hdfs put is a hdfs command. try the option -B?

sdhutchins commented 6 years ago

Also doesn't work with the -B option.

If it's not on the hdfs, it throws a file does not exist error. Which version of hadoop are you using? I think that could very well be the issue.

ddcap commented 6 years ago

Halvade doesn't look for that file on hdfs, it tries to find the binary archive on your local disk. Either in the directory you ran halvade from or if you provide the option it tries to locate it there. It doesn't matter if its on hdfs or not.

Maybe you don't have read access there? Some intallations of Hadoop use the yarn user to run Hadoop jobs.

I tried with Hadoop 2.2.x, 2.4.x 2.6.x and 2.7.x, so I doubt thats the issue.

sdhutchins commented 6 years ago

I should have read access. This is a local install of Hadoop version 2.9.1.

The binary file is in both locations. I'm not sure what else to try other than a different version of hadoop.

sdhutchins commented 6 years ago

If this doesn't work, however, I'd still like to be able to use this on a cluster. Would you recommend using halvade in a similar instance with something like pbs?

ddcap commented 6 years ago

You can continue with the python script since that seemed to work for you. can you provide the stderr of the failed task, it should give me more info why you had the first error.

You will need hadoop on a pbs system, there are options like hadoopondemand which deploys a hadoop cluster on the pbs job you requested.

sdhutchins commented 6 years ago

What about a cluster without a manager? Both the bash command and python script are yielding the exact same results and error codes.

#!/bin/bash

I="/home/shutchins2/halvade/in/"
O="/home/shutchins2/halvade/out/"
mem=128
vcores=24
D="/home/shutchins2/halvade/ref/dbsnp/dbsnp_138.hg19.vcf"
R="/home/shutchins2/halvade/ref/ucsc.hg19.fasta"
STAR="/home/shutchins2/halvade/ref/STAR_ref"
jar="/home/shutchins2/halvade/HalvadeWithLibs.jar"
N=1
B="/home/shutchins2/halvade/bin.tar.gz"

hadoop jar $jar -I $I -O $O --rna --star $STAR --nodes $N --mem $mem --vcores $vcores -D $D -R $R -B $B

That gives me the same error.

The failed task has 4 ansi characters

ÿÿÿÿ
ddcap commented 6 years ago

As long as you have hadoop mapreduce you can run halvade. So now the python script also doesnt find the bin file? What changed from before then?

sdhutchins commented 6 years ago

@ddcap It was always the same output. I wanted to try your setup to make sure it wasn't the python file.

ddcap commented 6 years ago

But now the hadoop job doesn't even start, before you had tasks that failed but they were started, now thy dont even start right?

sdhutchins commented 6 years ago

No, they're doing the same thing as before.

ddcap commented 6 years ago

Well before you had this error: cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1 While now you get this one: No binaries archive found Right or am I missing something?? They are not the same error. In the first hadoop has started jobs and tries to start STAR, while the second it cant find the bin.tar.gz so doesn't even start the job.

sdhutchins commented 6 years ago

Well before you had this error: cannot run program "./bin.tar.gz/bin/STAR": error=20, Not a directory exited with code -1 While now you get this one: No binaries archive found Right or am I missing something?? They are not the same error. In the first hadoop has started jobs and tries to start STAR, while the second it cant find the bin.tar.gz so doesn't even start the job.

Sorry, I didn't clarify. That was before I added the -B argument. Without it, I was getting the no binaries archive found error.

ddcap commented 6 years ago

Ok so you get tasks again, could you please send me the stderr/stdout of one of the tasks that failed?

sdhutchins commented 6 years ago

The file output?

ÿÿÿÿ

The other error is the same as above at https://github.com/biointec/halvade/issues/12#issuecomment-393656917

ddcap commented 6 years ago

is this stdout or stderr? stderr should show which files are in the bin.tar.gz and which programs have started and the error logs of those programs. Also do you use yarn to start mapreduce?

sdhutchins commented 6 years ago

That is the stdout from the hadoop tasks. The stderr I get from running halvade with python or the bash script is at https://github.com/biointec/halvade/issues/12#issuecomment-393656917.

I simply set up hadoop based on your workflow. Nothing extra.

sdhutchins commented 6 years ago

This is what the tmp/halvade folder contains:

image

There is nothing in the folder. The files have nothing in them as well.

ddcap commented 6 years ago

Do you have access to a different Hadoop cluster to test it?

sdhutchins commented 6 years ago

Unfortunately, I don't. @ddcap

ddcap commented 6 years ago

I can't seem to reproduce this error, so I cannot really help any further without the log files, that strangely doesn't contain the correct information on your cluster. My best guess is that some configuration on the node/cluster isn't correct.

sdhutchins commented 6 years ago

@ddcap would you be willing to give me your system information?

OS? Java version?