VDBWRAIR / pathdiscov

Pathogen Discover Pipeline
1 stars 1 forks source link

What is this error: WARNING: Unable to run getorf, please check if getorf program is running #114

Closed mmelendrez closed 9 years ago

mmelendrez commented 9 years ago

I got the above Warning and the pipeline only ran in 23 minutes which is unusual given the other datasets I've run. Suggestions?

________________________________________

________________________________________
Tasks which will be run:

Task enters queue = usamriidPathDiscov.main.(mkdir 1) before usamriidPathDiscov.main.createPram
Task enters queue = usamriidPathDiscov.main.(mkdir 1) before usamriidPathDiscov.main.fastQC
Completed Task = usamriidPathDiscov.main.(mkdir 1) before usamriidPathDiscov.main.createPram
Task enters queue = usamriidPathDiscov.main.createPram
Completed Task = usamriidPathDiscov.main.(mkdir 1) before usamriidPathDiscov.main.fastQC
Completed Task = usamriidPathDiscov.main.createPram
Task enters queue = usamriidPathDiscov.main.prepare_analysis
Completed Task = usamriidPathDiscov.main.prepare_analysis
Task enters queue = usamriidPathDiscov.main.fastQC
fastqc  USAMRIID_40/input/F.fastq  -o  USAMRIID_40/results/quality_analysis| tee  -a USAMRIID_40/results/quality_analysis/analysis_quality.log
Task enters queue = usamriidPathDiscov.main.priStage
fastqc  USAMRIID_40/input/R.fastq  -o  USAMRIID_40/results/quality_analysis| tee  -a USAMRIID_40/results/quality_analysis/analysis_quality.log
run_standard_stable4.pl  --sample  USAMRIID_40  --paramfile  USAMRIID_40/input/param.txt  --outputdir  USAMRIID_40/results  --R1  USAMRIID_40/input/F.fastq  --R2  USAMRIID_40/input/R.fastq  --blast_unassembled  1000| tee  -a USAMRIID_40/results/analysis.log
Completed Task = usamriidPathDiscov.main.fastQC
Completed Task = usamriidPathDiscov.main.priStage
ln -s results/output  /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/output
ln -s results/iterative_blast_phylo_1/reports  /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/contig_reports
ln -s results/iterative_blast_phylo_2/reports  /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/unassembledread_reports
ln -s results/step1/R1.count  /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/R1.count
ln -s results/step1/R2.count  /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/R2.count
ln -s results/quality_analysis  /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/quality_analysis
ln -s results/analysis.log  /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/analysis.log
WARNING! : Unable to run getorf, please check if getorf program is running
Time to complete the task .....0:23:04
demis001 commented 9 years ago

Can you check if "out.cap.fa" exist under "ray2_assembly_1" dir? The error is probably happens if "host_map" failed due to database or bowtie2 error.

mmelendrez commented 9 years ago

I found "out.cap.fa" under the output directory.

output/out.cap.fa

now looking for ray2_assembly_1 directory you mention...

mmelendrez commented 9 years ago

found another out.cap.fa under results/ray2_assembly_1

demis001 commented 9 years ago

Can you check the size?

mmelendrez commented 9 years ago

1

(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 Pathogen_Discovery]$ ls -l USAMRIID_40/results/ray2_assembly_1/
total 45068
lrwxrwxrwx. 1 melanie.melendrez nfsnobody       30 Jan 29 09:05 1.R1.unmap.fastq -> bowtie2_mapping/R1.unmap.fastq
lrwxrwxrwx. 1 melanie.melendrez nfsnobody       30 Jan 29 09:05 1.R2.unmap.fastq -> bowtie2_mapping/R2.unmap.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody       19 Jan 29 09:05 assembly.count
drwxr-xr-x. 2 melanie.melendrez nfsnobody     4096 Jan 29 09:05 bowtie2_index
drwxr-xr-x. 2 melanie.melendrez nfsnobody     4096 Jan 29 09:05 bowtie2_mapping
-rw-r--r--. 1 melanie.melendrez nfsnobody      221 Jan 29 09:05 cap3.out
-rw-r--r--. 1 melanie.melendrez nfsnobody        2 Jan 29 09:05 contig_len.txt
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 contig_numreads.txt
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 head.1.R1.unmap.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 head.1.R2.unmap.fastq
drwxr-xr-x. 2 melanie.melendrez nfsnobody     4096 Jan 29 09:05 logs
drwxr-xr-x. 2 melanie.melendrez nfsnobody     4096 Jan 29 09:05 logs_assembly
-rw-r--r--. 1 melanie.melendrez nfsnobody        1 Jan 29 09:05 out.cap.fa
-rw-r--r--. 1 melanie.melendrez nfsnobody        1 Jan 29 09:05 out.ray.fa
-rw-r--r--. 1 melanie.melendrez nfsnobody        8 Jan 29 09:05 out.ray.fa.cap.ace
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 out.ray.fa.cap.concat
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 out.ray.fa.cap.contigs
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 out.ray.fa.cap.contigs.links
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 out.ray.fa.cap.contigs.qual
-rw-r--r--. 1 melanie.melendrez nfsnobody      263 Jan 29 09:05 out.ray.fa.cap.info
-rw-r--r--. 1 melanie.melendrez nfsnobody        0 Jan 29 09:05 out.ray.fa.cap.singlets
-rw-r--r--. 1 melanie.melendrez nfsnobody        8 Jan 29 09:05 R1.count
-rw-r--r--. 1 melanie.melendrez nfsnobody 17520906 Jan 29 09:05 R1.paired.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody  6379176 Jan 29 09:05 R1.single.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody        8 Jan 29 09:05 R2.count
-rw-r--r--. 1 melanie.melendrez nfsnobody 17318586 Jan 29 09:05 R2.paired.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody  4695470 Jan 29 09:05 R2.single.fastq
lrwxrwxrwx. 1 melanie.melendrez nfsnobody       10 Jan 29 09:05 ray2_assembly_1.fasta -> out.cap.fa
demis001 commented 9 years ago

Check the log file under this folder for possible error, it is incomplete. "contig_numreads.txt" shouldn't be zero size.

demis001 commented 9 years ago

It seems " out.cap.fa" empty.

mmelendrez commented 9 years ago

First error I see in the analysis.log starts in the checkerror section (module?). Near bottom of file right after the read counts module.

                            /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e

This happens on iterative_blast_phylo_1 and 2, orf_filter, quailty_filter, ray2_assembly_1 and then cuts off at step1

Here's the full log:

-|-------pathogen pipeline-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/pathogen.pl --sample USAMRIID_40 --command step1 host_map quality_filter ray2_assembly iterative_blast_phylo orf_filter iterative_blast_phylo_2 --paramfile USAMRIID_40/input/param.txt --outputdir USAMRIID_40/results --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/F.fastq --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/R.fastq | tee -a USAMRIID_40/results/analysis.log

#########################################################################################
############################## PATHOGEN DISCOVERY PIPELINE ##############################
#########################################################################################

[START] 20150129-08.42
[echo] create project logs directory
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/logs

#########################################################################################
############################## PATHOGEN DISCOVERY PIPELINE ##############################
#########################################################################################

[START] 20150129-08.42
[echo] create project logs directory
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/logs

[module] step1 
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs

[module] step1 
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/step1/step1.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/F.fastq --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/R.fastq
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/step1/step1.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/F.fastq --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/R.fastq
[deltat] 9

[module] host_map
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs
[deltat] 9

[module] host_map
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/host_map/host_map.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R2 --fastafile no --wellpaired 1 --run_iteration 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/host_map/host_map.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R2 --fastafile no --wellpaired 1 --run_iteration 1
[deltat] 1299

[module] quality_filter 
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs
[deltat] 1299

[module] quality_filter 
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/quality_filter/quality_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R2
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/quality_filter/quality_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R2
[deltat] 54

[module] ray2_assembly
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs
[deltat] 54

[module] ray2_assembly
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/ray2_assembly/ray2_assembly.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R2 --fastafile no --run_iteration 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/ray2_assembly/ray2_assembly.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R2 --fastafile no --run_iteration 1
[deltat] 5

[module] iterative_blast_phylo
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs
[deltat] 5

[module] iterative_blast_phylo
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes --run_iteration 1 --contig 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes --run_iteration 1 --contig 1
[deltat] 1

[module] orf_filter 
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs
[deltat] 1

[module] orf_filter 
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/orf_filter/orf_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/orf_filter/orf_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes
[deltat] 0

[module] iterative_blast_phylo
[iteration] 2
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs
[deltat] 0

[module] iterative_blast_phylo
[iteration] 2
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/orf_filter.R1 --R2 none --fastafile yes --run_iteration 2 --contig 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/orf_filter.R1 --R2 none --fastafile yes --run_iteration 2 --contig 1
[deltat] 0

[END] 20150129-09.05
[DELTAT] 1368
[deltat] 0

[END] 20150129-09.05
[DELTAT] 1368

-|-------unassembled reads-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/pathogen.pl --sample USAMRIID_40 --command iterative_blast_phylo_2 --paramfile param.txt --outputdir USAMRIID_40/results --R1 USAMRIID_40/results/ray2_assembly_1/head.1.R1.unmap.fastq --R2 USAMRIID_40/results/ray2_assembly_1/head.1.R2.unmap.fastq

#########################################################################################
############################## PATHOGEN DISCOVERY PIPELINE ##############################
#########################################################################################

[START] 20150129-09.05
[echo] create project logs directory
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/logs

-|-------read counts-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/readcount.pl --sample USAMRIID_40 --outputdir USAMRIID_40/results/output --projdir USAMRIID_40/results --dirlist "step1,quality_filter,host_map_1,ray2_assembly_1,iterative_blast_phylo_1,iterative_blast_phylo_2" --trackread
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/process_counts.pl --sample USAMRIID_40 --outputdir USAMRIID_40/results/output > USAMRIID_40/results/output/stats.txt
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/augment_report.sh USAMRIID_40/results USAMRIID_40
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/join_smallreport.pl --outputdir USAMRIID_40/results/iterative_blast_phylo_2/reports --prefix USAMRIID_40 --R1report USAMRIID_40/results/iterative_blast_phylo_2/reports/R1.USAMRIID_40.top.smallreport.txt --R2report USAMRIID_40/results/iterative_blast_phylo_2/reports/R2.USAMRIID_40.top.smallreport.txt --R1qualdiscard USAMRIID_40/results/quality_filter/R1.discard --R1hostdiscard USAMRIID_40/results/host_map_1/R1.discard --R2qualdiscard USAMRIID_40/results/quality_filter/R2.discard --R2hostdiscard USAMRIID_40/results/host_map_1/R2.discard

-|-------checkerror-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/pathogen.pl --checkerror --outputdir USAMRIID_40/results
***host_map_1***
.
                            /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***iterative_blast_phylo_1***
.
                            /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***iterative_blast_phylo_2***
..
***orf_filter***
.
                            /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***quality_filter***
.
                            /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***ray2_assembly_1***
.
                            /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs_assembly/assembly.e
[error] error file non-zero: assembly.e
..
                            /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***step1***
..

-|-------cleanup-------|-
demis001 commented 9 years ago

Check your param.txt under input, bowtie2 and the database.

mmelendrez commented 9 years ago

what am i looking for? or would you like me to post it?

demis001 commented 9 years ago

I don't think the application is properly installed. Open “param.txt” file and see if the location of the databases properly specified.

mmelendrez commented 9 years ago

Hmmm...I ran this pipeline on 6 datasets between yesterday and day before, i would think it would've failed on all then...but here's what I see in the param.txt

I do know that current location of databases that I'm using is

/media/VD_Research/databases

Here are all the places I see db in the param.txt under the input directory. It looks fine to me, but have a look.

# -------------------
command host_map
# for the "_list" settings, use a comma-delimited list with no spaces

mapper_program_list     bowtie2,bowtie2                                         # choices are: bwa, bowtie2
mapper_db_list          /media/VD_Research/databases/humandna/human_dna,/media/VD_Research/databases/humanrna/h_sapiens_rna
mapper_name_list        bowtie2_genome_local,bowtie2_transcript_local                           # names that will appear on a graph
mapper_options_list     --local,--local                                         # flags for aligner

# explanation: perform a chain of alignments.
# -------------------
command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time

mapper_program_list         bowtie2                                         # choices are: bwa, bowtie2
mapper_db_list              /media/VD_Research/databases/humandna/human_dna     # prefix of aligner indexed database
mapper_name_list            bowtie2_genome                                      # names that will appear on a graph
mapper_options_list                                                     # flags for aligner

# -----------------
# -------------------
# command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time

# mapper_program_list           bwa                                         # choices are: bwa, bowtie2
# mapper_db_list            /media/VD_Research/databases/humandna/human_dna         # prefix of aligner indexed database
# mapper_name_list          bwa_genome                                      # names that will appear on a graph
# mapper_options_list                                                       # flags for aligner

# -------------------
# -------------------
command iterative_blast_phylo

# for the "_list" settings, use a comma-delimited list with no spaces

blast_db_list               /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list             megablast,dc-megablast                              # options are: megablast dc-megablast blastn, blastx
# blast_options_list            -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7     # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list          -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4          # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list              10,10                                       # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names              /media/VD_Research/databases/ncbi/taxonomy/names.dmp                    # NCBI taxonomy names dump file
taxonomy_nodes              /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp                    # NCBI taxonomy nodes dump file

# explanation: perform a chain of blasts. the blast is performed in chunks to speed up the process. note: if you use blastx, make sure the chunks are really small. otherwise, it takes a long time.

# -------------------
# -------------------
command iterative_blast_phylo_2
# this is the same as iterative_blast_phylo --- it just allows you to do it an n-th time

# for the "_list" settings, use a comma-delimited list with no spaces

blast_db_list               /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list             megablast,dc-megablast                              # options are: megablast dc-megablast blastn, blastx
# blast_options_list            -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7     # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list          -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4          # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list              10,10                       # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names              /media/VD_Research/databases/ncbi/taxonomy/names.dmp                    # NCBI taxonomy names dump file
taxonomy_nodes              /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp                    # NCBI taxonomy nodes dump file

# explanation: run another instance of the iterative_blast_phylo module. you can run more instances by adding entries in this parameter file for "iterative_blast_3", "iterative_blast_4", etc

# -------------------
# -------------------
command nohost_blast

ncbi_nt_db              /media/VD_Research/databases/ncbi/blast/nt/nt # full path to NCBI nt database prefix
gi2taxid                /media/VD_Research/databases/ncbi/blast/nt/nt/gi2taxid.txt                      # file with col1=gi number, col2=taxid (to make this file: blastdbcmd -db /data/db/nt -entry all -outfmt '%g    %T' > gi2taxid.txt) eg. blastdbcmd -db /media/VD_Research/People/Dereje.Jima/databases/ncbi/blast/nt/nt -entry all -outfmt '%g    %T' > gi2taxid.txt
num_subset_seq              200                                         # the number of sequences in the small initial file to be blasted
blast_type              blastn                                          # options are: blastn blastx
blast_task              megablast                                       # options are: megablast dc-megablast blastn
blast_options               -evalue 1e-4 -word_size 28                              # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst                   10                                          # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel

# explanation: nohost_blast will blast a subset (determined by "num_subset_seq") of your initial file to nt. it will get the taxid from the hits to nt and use these tax id to make a new database, which it will then blast the full initial file against

# -------------------
demis001 commented 9 years ago

We talked this last time, and we clearly wrote how to setup "databases", you have to make symbolic link under your $HOME directory. We may change this later, but I designed the application this way. Her is the line from the man page:

ln -s /path/to/databases $HOME/databases

Dereje

mmelendrez commented 9 years ago

Ok I will do this - but for 6 datasets I have run it this way and it's worked perfectly. So I don't think this gets at the original problem. But let me make the change and I'll rerun.

mmelendrez commented 9 years ago

softlink made:

(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 ~]$ ls -l $HOME
total 20256
drwx------.  4 melanie.melendrez domain users    4096 Jun  4  2014 a5_miseq_linux_20140604
drwxrwxr-x.  2 melanie.melendrez domain users    4096 Dec 17 16:44 bin
-rw-------.  1 melanie.melendrez domain users 6732683 Sep  9 13:41 cStringIO
lrwxrwxrwx.  1 melanie.melendrez domain users      29 Jan 29 13:55 databases -> /media/VD_Research/databases/
drwx------.  9 melanie.melendrez domain users   12288 Jan 28 14:14 Desktop
drwx------.  2 melanie.melendrez domain users    4096 Sep 29 08:00 dist
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Oct 25 12:37 doc
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Oct 28  2013 Documents
drwx------.  5 melanie.melendrez domain users    4096 Aug 13 12:39 dodhpc
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Jan 29 08:31 Downloads
drwxr-xr-x.  8 melanie.melendrez domain users    4096 Dec  2 12:58 FastQC
drwx------.  3 melanie.melendrez domain users    4096 Oct 14 16:23 home
drwx------.  5 melanie.melendrez domain users    4096 Feb  5  2014 igv
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Sep 15 13:56 include
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Oct 25 12:32 lib
drwx------.  2 melanie.melendrez domain users    4096 Sep 15 13:14 man
-rw-rw-r--.  1 melanie.melendrez domain users  394681 Oct  1  2013 math
drwxr-x---. 34 melanie.melendrez domain users    4096 May  1  2014 melanie
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Music
drwxr-xr-x.  6 melanie.melendrez domain users    4096 Oct 25 13:09 myhome
drwxr-xr-x. 32 melanie.melendrez domain users    4096 Jan 29 10:44 ngs_mapper
-rw-------.  1 melanie.melendrez domain users 6732676 Sep  9 13:41 os
drwx------.  5 melanie.melendrez domain users    4096 Aug 22 14:58 pbs_scripts
drwxr-xr-x.  8 melanie.melendrez domain users    4096 Mar 20  2014 phylosift_v1.0.1
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Pictures
drwxr-xr-x.  3 melanie.melendrez domain users    4096 Dec 23 17:27 pipelinedump
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Public
drwx------.  8 melanie.melendrez domain users    4096 Sep 29 08:00 redsample
drwx------.  2 melanie.melendrez domain users    4096 Sep 29 08:00 redsample.egg-info
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Nov 28 14:08 share
drwx------. 13 melanie.melendrez domain users    4096 Dec 17 16:44 src
drwxr-xr-x.  3 melanie.melendrez domain users    4096 Oct  7 16:58 sshtunnel
-rw-rw-r--.  1 melanie.melendrez domain users 6732677 Sep  9 13:41 sys
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Templates
-rw-------.  1 melanie.melendrez domain users     355 Oct  7 17:45 tunnel.log
drwxr-xr-x. 13 melanie.melendrez domain users    4096 Dec 31 10:59 usamriidPathDiscov
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Videos
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Dec 14 01:57 ViQuaS1.3
demis001 commented 9 years ago

I don't know why the application failed if you didn't make any change. I installed and run 80 samples for Jun, everytime, I run it worked.

mmelendrez commented 9 years ago

just because you run 100's of samples for whoever doesn't mean it'll work perfectly on every dataset. every user and every dataset is different. so let me just rerun and lets see what happens. We'll see if softlinking fixes the issue.

stay tuned.

mmelendrez commented 9 years ago

I am now editing my config.yaml to make sure it has

databases: ~/databases
mmelendrez commented 9 years ago

and running...will update when it finishes so we know if the problem fixed. Last time it ended around 23 minutes when I expected it to run at least an hour - so we'll see.

demis001 commented 9 years ago

Your symbolic link has "/" at the end, it shoudl be looks like this

"databases -> /media/VD_Research/databases" not "databases -> /media/VD_Research/databases/"

mmelendrez commented 9 years ago

this is the command I ran:

(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 ~]$ ln -s /media/VD_Research/databases/ $HOME/databases

incorrect?

mmelendrez commented 9 years ago

so I should get rid of the / after databases on both?

mmelendrez commented 9 years ago

the link is blue - so it looks like it's active...but I can redo the link

mmelendrez commented 9 years ago

cancelling run...redoing link

demis001 commented 9 years ago

Try this: pushd $HOME unlink databases ln -s /media/VD_Research/databases databases popd

mmelendrez commented 9 years ago

jus a sec...gotta kill some processes to make the sure the pipeline is stopped

demis001 commented 9 years ago

Can you also paste the line you excute to run this task?

mmelendrez commented 9 years ago

k ran your command above: now my files look like this:

Yes?

(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 Pathogen_Discovery]$ ll ~/
total 20256
drwx------.  4 melanie.melendrez domain users    4096 Jun  4  2014 a5_miseq_linux_20140604
drwxrwxr-x.  2 melanie.melendrez domain users    4096 Dec 17 16:44 bin
-rw-------.  1 melanie.melendrez domain users 6732683 Sep  9 13:41 cStringIO
lrwxrwxrwx.  1 melanie.melendrez domain users      28 Jan 29 14:09 databases -> /media/VD_Research/databases
drwx------.  9 melanie.melendrez domain users   12288 Jan 28 14:14 Desktop
drwx------.  2 melanie.melendrez domain users    4096 Sep 29 08:00 dist
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Oct 25 12:37 doc
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Oct 28  2013 Documents
drwx------.  5 melanie.melendrez domain users    4096 Aug 13 12:39 dodhpc
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Jan 29 08:31 Downloads
drwxr-xr-x.  8 melanie.melendrez domain users    4096 Dec  2 12:58 FastQC
drwx------.  3 melanie.melendrez domain users    4096 Oct 14 16:23 home
drwx------.  5 melanie.melendrez domain users    4096 Feb  5  2014 igv
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Sep 15 13:56 include
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Oct 25 12:32 lib
drwx------.  2 melanie.melendrez domain users    4096 Sep 15 13:14 man
-rw-rw-r--.  1 melanie.melendrez domain users  394681 Oct  1  2013 math
drwxr-x---. 34 melanie.melendrez domain users    4096 May  1  2014 melanie
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Music
drwxr-xr-x.  6 melanie.melendrez domain users    4096 Oct 25 13:09 myhome
drwxr-xr-x. 32 melanie.melendrez domain users    4096 Jan 29 10:44 ngs_mapper
-rw-------.  1 melanie.melendrez domain users 6732676 Sep  9 13:41 os
drwx------.  5 melanie.melendrez domain users    4096 Aug 22 14:58 pbs_scripts
drwxr-xr-x.  8 melanie.melendrez domain users    4096 Mar 20  2014 phylosift_v1.0.1
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Pictures
drwxr-xr-x.  3 melanie.melendrez domain users    4096 Dec 23 17:27 pipelinedump
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Public
drwx------.  8 melanie.melendrez domain users    4096 Sep 29 08:00 redsample
drwx------.  2 melanie.melendrez domain users    4096 Sep 29 08:00 redsample.egg-info
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Nov 28 14:08 share
drwx------. 13 melanie.melendrez domain users    4096 Dec 17 16:44 src
drwxr-xr-x.  3 melanie.melendrez domain users    4096 Oct  7 16:58 sshtunnel
-rw-rw-r--.  1 melanie.melendrez domain users 6732677 Sep  9 13:41 sys
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Templates
-rw-------.  1 melanie.melendrez domain users     355 Oct  7 17:45 tunnel.log
drwxr-xr-x. 13 melanie.melendrez domain users    4096 Dec 31 10:59 usamriidPathDiscov
drwxr-xr-x.  2 melanie.melendrez domain users    4096 Sep 13  2013 Videos
drwxr-xr-x.  4 melanie.melendrez domain users    4096 Dec 14 01:57 ViQuaS1.3
mmelendrez commented 9 years ago

alright - rerunning

demis001 commented 9 years ago

Paste the command line your run?

mmelendrez commented 9 years ago
(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 Pathogen_Discovery]$ usamriidPathDiscov_cli -R1 ../40_S7_L001_R1_001.fastq -R2 ../40_S7_L001_R2_001.fastq --outdir USAMRIID-2_40

Fine?

mmelendrez commented 9 years ago

same command I used for the other datasets.

demis001 commented 9 years ago

Can paste few lines from "USAMRIID-2_40/input/param.txt"? I want to see the database lines.

mmelendrez commented 9 years ago

I didn't alter anything in param.txt but just a sec...let me bring it up

mmelendrez commented 9 years ago

oh wait that doesn't generate til after I start the run right?

mmelendrez commented 9 years ago

k lemme start it

mmelendrez commented 9 years ago

here's the param.txt

# This is the parameter file for the pathogen discovery pipeline.
# Every command (or "module", if you like) has its own settings. The settings for a particular module must follow its "command module" line. Otherwise, the order doesn't matter.

# How to use this parameter file:
# for boolean options, use "yes" or "1" for assent OR "no" or "0" or "-" for dissent (or simply comment out or omit the line).
# the default settings are generally "no" unless otherwise specified

# note: make sure nt and nr are up-to-date!

# -------------------
command step1

seq_platform                illumina                                        # choices are: illumina or 454

# explanation: must be run first --- hence titled step1. this module maps the fastq IDs into simple numerical IDs and processes .sff file if 454

# -------------------
command quality_filter

cutadapt_options_R1         -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g GCCGGAGCTCTGCAGATATC -a GATATCTGCAGAGCTCCGGC -m 50 --match-read-wildcards
cutadapt_options_R2         -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -g GCCGGAGCTCTGCAGATATC -a GATATCTGCAGAGCTCCGGC -m 50 --match-read-wildcards

# cutadapt_options          -g GCCGGAGCTCTGCAGATATC -m 20 --match-read-wildcards                                # this should have every flag except the input and output ones
# cutadapt_options2         -g CGCCGTTTCCCAGTAGGTCTC -m 20 --match-read-wildcards                               # this should have every flag except the input and output ones
# prinseq_options           -log -verbose -min_len 50 -ns_max_p 10 -derep 12345                                 # this should have every flag except the input and output ones (i.e., don't specify the "-out_good" "-out_bad" or "fastq" flags)
prinseq_options             -min_len 50 -derep 14 -lc_method dust -lc_threshold 3 -trim_ns_left 1 -trim_ns_right 1 -trim_qual_right 15  # this should have every flag except the input and output ones (i.e., don't specify the "-out_good" "-out_bad" or "fastq" flags)

# explanation: run up to 2 iterations of cutadapt, if specified, and prinseq, if specified

# -------------------
command host_map
# for the "_list" settings, use a comma-delimited list with no spaces

mapper_program_list     bowtie2,bowtie2                                         # choices are: bwa, bowtie2
mapper_db_list          /media/VD_Research/databases/humandna/human_dna,/media/VD_Research/databases/humanrna/h_sapiens_rna
mapper_name_list        bowtie2_genome_local,bowtie2_transcript_local                           # names that will appear on a graph
mapper_options_list     --local,--local                                         # flags for aligner

# explanation: perform a chain of alignments.

# -------------------
command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time

mapper_program_list         bowtie2                                         # choices are: bwa, bowtie2
mapper_db_list              /media/VD_Research/databases/humandna/human_dna     # prefix of aligner indexed database
mapper_name_list            bowtie2_genome                                      # names that will appear on a graph
mapper_options_list                                                     # flags for aligner

# -------------------
# command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time

# mapper_program_list           bwa                                         # choices are: bwa, bowtie2
# mapper_db_list            /media/VD_Research/databases/humandna/human_dna         # prefix of aligner indexed database
# mapper_name_list          bwa_genome                                      # names that will appear on a graph
# mapper_options_list                                                       # flags for aligner

# -------------------
command iterative_blast_phylo

# for the "_list" settings, use a comma-delimited list with no spaces

blast_db_list               /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list             megablast,dc-megablast                              # options are: megablast dc-megablast blastn, blastx
# blast_options_list            -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7     # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list          -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4          # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list              10,10                                       # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names              /media/VD_Research/databases/ncbi/taxonomy/names.dmp                    # NCBI taxonomy names dump file
taxonomy_nodes              /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp                    # NCBI taxonomy nodes dump file

# explanation: perform a chain of blasts. the blast is performed in chunks to speed up the process. note: if you use blastx, make sure the chunks are really small. otherwise, it takes a long time.

# -------------------
command iterative_blast_phylo_2
# this is the same as iterative_blast_phylo --- it just allows you to do it an n-th time

# for the "_list" settings, use a comma-delimited list with no spaces

blast_db_list               /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list             megablast,dc-megablast                              # options are: megablast dc-megablast blastn, blastx
# blast_options_list            -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7     # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list          -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4          # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list              10,10                       # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names              /media/VD_Research/databases/ncbi/taxonomy/names.dmp                    # NCBI taxonomy names dump file
taxonomy_nodes              /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp                    # NCBI taxonomy nodes dump file

# explanation: run another instance of the iterative_blast_phylo module. you can run more instances by adding entries in this parameter file for "iterative_blast_3", "iterative_blast_4", etc

# -------------------
command nohost_blast

ncbi_nt_db              /media/VD_Research/databases/ncbi/blast/nt/nt # full path to NCBI nt database prefix
gi2taxid                /media/VD_Research/databases/ncbi/blast/nt/nt/gi2taxid.txt                      # file with col1=gi number, col2=taxid (to make this file: blastdbcmd -db /data/db/nt -entry all -outfmt '%g    %T' > gi2taxid.txt) eg. blastdbcmd -db /media/VD_Research/People/Dereje.Jima/databases/ncbi/blast/nt/nt -entry all -outfmt '%g    %T' > gi2taxid.txt
num_subset_seq              200                                         # the number of sequences in the small initial file to be blasted
blast_type              blastn                                          # options are: blastn blastx
blast_task              megablast                                       # options are: megablast dc-megablast blastn
blast_options               -evalue 1e-4 -word_size 28                              # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst                   10                                          # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel

# explanation: nohost_blast will blast a subset (determined by "num_subset_seq") of your initial file to nt. it will get the taxid from the hits to nt and use these tax id to make a new database, which it will then blast the full initial file against

# -------------------
command ray2_assembly

kmer                    25                                          # assembler k-mer
ninst                   10                                          # number of instances for mpiexec
cap                     1                                           # use cap after ray
# cap_options                                                           # cap options
map2contigs             yes                                         # if "yes" or "1", map reads back onto assembly
bowtie2_options             --local                                         # only nec if map2contigs. Options for the mapper

# explanation: perform an assembly using Ray

# -------------------
command ray2_assembly_2
# this is the same as ray2_assembly --- it just allows you to do it an n-th time

kmer                    25                                          # assembler k-mer
ninst                   10                                          # number of instances for mpiexec
cap                     1                                           # use cap after ray
# cap_options                                                           # cap options
map2contigs             yes                                         # if "yes" or "1", map reads back onto assembly
bowtie2_options             --local                                         # only nec if map2contigs. Options for the mapper

# -------------------
command orf_filter
# filter fasta input by orf

getorf_options              -minsize 60 -find 0                                     # any options other than "-sequence" and "-outseq"
demis001 commented 9 years ago

Still not changed, did you edit "config.yaml" and then "python setup install"

mmelendrez commented 9 years ago

cancelling run again...

nope - didn't know I had to do that.

I edited config.yaml

# database location
# Upadate all database locations

# Point databases to the location of your databases
databases: ~/databases

# These will all have the value of databases joined to them
# when the pipeline runs
human_dna: humandna/human_dna
human_rna: humanrna/h_sapiens_rna
nt_db:  ncbi/blast/nt/nt
tax_nodes: ncbi/taxonomy/nodes.dmp
tax_names: ncbi/taxonomy/names.dmp

PHRED_OFFSET: 33
SEQUENCE_PLATFORM: illumina  #choices are: illumina,454
NODE_NUM: 10  # Number of computer nodes or CPUS
#paired end lane to run
blast_unassembled: 1000

didn't know I had to rerun python setup install... let me do that

mmelendrez commented 9 years ago

ran

python setup.py install

seemed to work fine.

mmelendrez commented 9 years ago

rerunning USAMRIID-2_40 again

demis001 commented 9 years ago

Start it now and check the "param.txt" file, it should have "$HOME/databases...."

mmelendrez commented 9 years ago
# -------------------
command host_map
# for the "_list" settings, use a comma-delimited list with no spaces

mapper_program_list     bowtie2,bowtie2                                         # choices are: bwa, bowtie2
mapper_db_list          /home/AMED/melanie.melendrez/databases/humandna/human_dna,/home/AMED/melanie.melendrez/databases/humanrna/h_sapiens_rna
mapper_name_list        bowtie2_genome_local,bowtie2_transcript_local                           # names that will appear on a graph
mapper_options_list     --local,--local                                         # flags for aligner

# explanation: perform a chain of alignments.

# ------------------

that looks right...right? It doesn't say $HOME but $HOME is the same as /home/AMED/melanie.melendrez/databases...

demis001 commented 9 years ago

yes

mmelendrez commented 9 years ago

k I'll let you know when it finishes.

mmelendrez commented 9 years ago

run finished - make_summary and make_pie ran too. closing issue.

mmelendrez commented 9 years ago

Just as a side note out of curiosity last night I also reran the pipeline by replacing the softlink with the path to the databases onsite here /media/VD_Research/databases in the config.yaml. I then reran python setup.py install and reran the pipeline and it finished in 3:11:20.

So for future notice - Both methods work, whether you use the direct path to the databases or you use the softlink. So the error I encountered didn't have to do with the databases.

I did notice my terminal was acting funny throughout the day and occasionally I had some conflicting processes going on that I had to kill and might have interfered with that original run that err'd out.

Dereje mentioned he would make the getorf Warning more informative in the future so we can pinpoint a little easier where the problem occurred when we get warnings or errors from the pipeline.