Closed mmelendrez closed 9 years ago
Can you check if "out.cap.fa" exist under "ray2_assembly_1" dir? The error is probably happens if "host_map" failed due to database or bowtie2 error.
I found "out.cap.fa" under the output directory.
output/out.cap.fa
now looking for ray2_assembly_1 directory you mention...
found another out.cap.fa under results/ray2_assembly_1
Can you check the size?
1
(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 Pathogen_Discovery]$ ls -l USAMRIID_40/results/ray2_assembly_1/
total 45068
lrwxrwxrwx. 1 melanie.melendrez nfsnobody 30 Jan 29 09:05 1.R1.unmap.fastq -> bowtie2_mapping/R1.unmap.fastq
lrwxrwxrwx. 1 melanie.melendrez nfsnobody 30 Jan 29 09:05 1.R2.unmap.fastq -> bowtie2_mapping/R2.unmap.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody 19 Jan 29 09:05 assembly.count
drwxr-xr-x. 2 melanie.melendrez nfsnobody 4096 Jan 29 09:05 bowtie2_index
drwxr-xr-x. 2 melanie.melendrez nfsnobody 4096 Jan 29 09:05 bowtie2_mapping
-rw-r--r--. 1 melanie.melendrez nfsnobody 221 Jan 29 09:05 cap3.out
-rw-r--r--. 1 melanie.melendrez nfsnobody 2 Jan 29 09:05 contig_len.txt
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 contig_numreads.txt
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 head.1.R1.unmap.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 head.1.R2.unmap.fastq
drwxr-xr-x. 2 melanie.melendrez nfsnobody 4096 Jan 29 09:05 logs
drwxr-xr-x. 2 melanie.melendrez nfsnobody 4096 Jan 29 09:05 logs_assembly
-rw-r--r--. 1 melanie.melendrez nfsnobody 1 Jan 29 09:05 out.cap.fa
-rw-r--r--. 1 melanie.melendrez nfsnobody 1 Jan 29 09:05 out.ray.fa
-rw-r--r--. 1 melanie.melendrez nfsnobody 8 Jan 29 09:05 out.ray.fa.cap.ace
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 out.ray.fa.cap.concat
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 out.ray.fa.cap.contigs
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 out.ray.fa.cap.contigs.links
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 out.ray.fa.cap.contigs.qual
-rw-r--r--. 1 melanie.melendrez nfsnobody 263 Jan 29 09:05 out.ray.fa.cap.info
-rw-r--r--. 1 melanie.melendrez nfsnobody 0 Jan 29 09:05 out.ray.fa.cap.singlets
-rw-r--r--. 1 melanie.melendrez nfsnobody 8 Jan 29 09:05 R1.count
-rw-r--r--. 1 melanie.melendrez nfsnobody 17520906 Jan 29 09:05 R1.paired.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody 6379176 Jan 29 09:05 R1.single.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody 8 Jan 29 09:05 R2.count
-rw-r--r--. 1 melanie.melendrez nfsnobody 17318586 Jan 29 09:05 R2.paired.fastq
-rw-r--r--. 1 melanie.melendrez nfsnobody 4695470 Jan 29 09:05 R2.single.fastq
lrwxrwxrwx. 1 melanie.melendrez nfsnobody 10 Jan 29 09:05 ray2_assembly_1.fasta -> out.cap.fa
Check the log file under this folder for possible error, it is incomplete. "contig_numreads.txt" shouldn't be zero size.
It seems " out.cap.fa" empty.
First error I see in the analysis.log starts in the checkerror section (module?). Near bottom of file right after the read counts module.
/media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
This happens on iterative_blast_phylo_1 and 2, orf_filter, quailty_filter, ray2_assembly_1 and then cuts off at step1
Here's the full log:
-|-------pathogen pipeline-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/pathogen.pl --sample USAMRIID_40 --command step1 host_map quality_filter ray2_assembly iterative_blast_phylo orf_filter iterative_blast_phylo_2 --paramfile USAMRIID_40/input/param.txt --outputdir USAMRIID_40/results --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/F.fastq --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/R.fastq | tee -a USAMRIID_40/results/analysis.log
#########################################################################################
############################## PATHOGEN DISCOVERY PIPELINE ##############################
#########################################################################################
[START] 20150129-08.42
[echo] create project logs directory
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/logs
#########################################################################################
############################## PATHOGEN DISCOVERY PIPELINE ##############################
#########################################################################################
[START] 20150129-08.42
[echo] create project logs directory
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/logs
[module] step1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs
[module] step1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/step1/step1.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/F.fastq --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/R.fastq
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/step1/step1.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/F.fastq --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/input/R.fastq
[deltat] 9
[module] host_map
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs
[deltat] 9
[module] host_map
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/host_map/host_map.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R2 --fastafile no --wellpaired 1 --run_iteration 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/host_map/host_map.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/step1/step1.R2 --fastafile no --wellpaired 1 --run_iteration 1
[deltat] 1299
[module] quality_filter
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs
[deltat] 1299
[module] quality_filter
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/quality_filter/quality_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R2
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/quality_filter/quality_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/host_map_1.R2
[deltat] 54
[module] ray2_assembly
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs
[deltat] 54
[module] ray2_assembly
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/ray2_assembly/ray2_assembly.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R2 --fastafile no --run_iteration 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/ray2_assembly/ray2_assembly.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R1 --R2 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/quality_filter.R2 --fastafile no --run_iteration 1
[deltat] 5
[module] iterative_blast_phylo
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs
[deltat] 5
[module] iterative_blast_phylo
[iteration] 1
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes --run_iteration 1 --contig 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes --run_iteration 1 --contig 1
[deltat] 1
[module] orf_filter
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs
[deltat] 1
[module] orf_filter
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/orf_filter/orf_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/orf_filter/orf_filter.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/ray2_assembly_1.fasta --R2 none --fastafile yes
[deltat] 0
[module] iterative_blast_phylo
[iteration] 2
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs
[deltat] 0
[module] iterative_blast_phylo
[iteration] 2
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/orf_filter.R1 --R2 none --fastafile yes --run_iteration 2 --contig 1
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample USAMRIID_40 --paramfile USAMRIID_40/input/param.txt --outputdir /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2 --logs /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_2/logs --timestamp 20150129-08.42 --R1 /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/orf_filter.R1 --R2 none --fastafile yes --run_iteration 2 --contig 1
[deltat] 0
[END] 20150129-09.05
[DELTAT] 1368
[deltat] 0
[END] 20150129-09.05
[DELTAT] 1368
-|-------unassembled reads-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/pathogen.pl --sample USAMRIID_40 --command iterative_blast_phylo_2 --paramfile param.txt --outputdir USAMRIID_40/results --R1 USAMRIID_40/results/ray2_assembly_1/head.1.R1.unmap.fastq --R2 USAMRIID_40/results/ray2_assembly_1/head.1.R2.unmap.fastq
#########################################################################################
############################## PATHOGEN DISCOVERY PIPELINE ##############################
#########################################################################################
[START] 20150129-09.05
[echo] create project logs directory
[cmd] mkdir -p /media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/logs
-|-------read counts-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/readcount.pl --sample USAMRIID_40 --outputdir USAMRIID_40/results/output --projdir USAMRIID_40/results --dirlist "step1,quality_filter,host_map_1,ray2_assembly_1,iterative_blast_phylo_1,iterative_blast_phylo_2" --trackread
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/process_counts.pl --sample USAMRIID_40 --outputdir USAMRIID_40/results/output > USAMRIID_40/results/output/stats.txt
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/augment_report.sh USAMRIID_40/results USAMRIID_40
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/scripts/join_smallreport.pl --outputdir USAMRIID_40/results/iterative_blast_phylo_2/reports --prefix USAMRIID_40 --R1report USAMRIID_40/results/iterative_blast_phylo_2/reports/R1.USAMRIID_40.top.smallreport.txt --R2report USAMRIID_40/results/iterative_blast_phylo_2/reports/R2.USAMRIID_40.top.smallreport.txt --R1qualdiscard USAMRIID_40/results/quality_filter/R1.discard --R1hostdiscard USAMRIID_40/results/host_map_1/R1.discard --R2qualdiscard USAMRIID_40/results/quality_filter/R2.discard --R2hostdiscard USAMRIID_40/results/host_map_1/R2.discard
-|-------checkerror-------|-
[cmd] /home/AMED/melanie.melendrez/usamriidPathDiscov/usamriidPathDiscov/pathogen.pl --checkerror --outputdir USAMRIID_40/results
***host_map_1***
.
/media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/host_map_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***iterative_blast_phylo_1***
.
/media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/iterative_blast_phylo_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***iterative_blast_phylo_2***
..
***orf_filter***
.
/media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/orf_filter/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***quality_filter***
.
/media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/quality_filter/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***ray2_assembly_1***
.
/media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs_assembly/assembly.e
[error] error file non-zero: assembly.e
..
/media/VD_Research/Analysis/ProjectBased_Analysis/melanie/share/Issue_9600/Issue_9608/Pathogen_Discovery/USAMRIID_40/results/ray2_assembly_1/logs/USAMRIID_40.20150129-08.42-out.e
[error] error file non-zero: USAMRIID_40.20150129-08.42-out.e
.
***step1***
..
-|-------cleanup-------|-
Check your param.txt under input, bowtie2 and the database.
what am i looking for? or would you like me to post it?
I don't think the application is properly installed. Open “param.txt” file and see if the location of the databases properly specified.
Hmmm...I ran this pipeline on 6 datasets between yesterday and day before, i would think it would've failed on all then...but here's what I see in the param.txt
I do know that current location of databases that I'm using is
/media/VD_Research/databases
Here are all the places I see db in the param.txt under the input directory. It looks fine to me, but have a look.
# -------------------
command host_map
# for the "_list" settings, use a comma-delimited list with no spaces
mapper_program_list bowtie2,bowtie2 # choices are: bwa, bowtie2
mapper_db_list /media/VD_Research/databases/humandna/human_dna,/media/VD_Research/databases/humanrna/h_sapiens_rna
mapper_name_list bowtie2_genome_local,bowtie2_transcript_local # names that will appear on a graph
mapper_options_list --local,--local # flags for aligner
# explanation: perform a chain of alignments.
# -------------------
command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time
mapper_program_list bowtie2 # choices are: bwa, bowtie2
mapper_db_list /media/VD_Research/databases/humandna/human_dna # prefix of aligner indexed database
mapper_name_list bowtie2_genome # names that will appear on a graph
mapper_options_list # flags for aligner
# -----------------
# -------------------
# command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time
# mapper_program_list bwa # choices are: bwa, bowtie2
# mapper_db_list /media/VD_Research/databases/humandna/human_dna # prefix of aligner indexed database
# mapper_name_list bwa_genome # names that will appear on a graph
# mapper_options_list # flags for aligner
# -------------------
# -------------------
command iterative_blast_phylo
# for the "_list" settings, use a comma-delimited list with no spaces
blast_db_list /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list megablast,dc-megablast # options are: megablast dc-megablast blastn, blastx
# blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list 10,10 # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names /media/VD_Research/databases/ncbi/taxonomy/names.dmp # NCBI taxonomy names dump file
taxonomy_nodes /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp # NCBI taxonomy nodes dump file
# explanation: perform a chain of blasts. the blast is performed in chunks to speed up the process. note: if you use blastx, make sure the chunks are really small. otherwise, it takes a long time.
# -------------------
# -------------------
command iterative_blast_phylo_2
# this is the same as iterative_blast_phylo --- it just allows you to do it an n-th time
# for the "_list" settings, use a comma-delimited list with no spaces
blast_db_list /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list megablast,dc-megablast # options are: megablast dc-megablast blastn, blastx
# blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list 10,10 # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names /media/VD_Research/databases/ncbi/taxonomy/names.dmp # NCBI taxonomy names dump file
taxonomy_nodes /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp # NCBI taxonomy nodes dump file
# explanation: run another instance of the iterative_blast_phylo module. you can run more instances by adding entries in this parameter file for "iterative_blast_3", "iterative_blast_4", etc
# -------------------
# -------------------
command nohost_blast
ncbi_nt_db /media/VD_Research/databases/ncbi/blast/nt/nt # full path to NCBI nt database prefix
gi2taxid /media/VD_Research/databases/ncbi/blast/nt/nt/gi2taxid.txt # file with col1=gi number, col2=taxid (to make this file: blastdbcmd -db /data/db/nt -entry all -outfmt '%g %T' > gi2taxid.txt) eg. blastdbcmd -db /media/VD_Research/People/Dereje.Jima/databases/ncbi/blast/nt/nt -entry all -outfmt '%g %T' > gi2taxid.txt
num_subset_seq 200 # the number of sequences in the small initial file to be blasted
blast_type blastn # options are: blastn blastx
blast_task megablast # options are: megablast dc-megablast blastn
blast_options -evalue 1e-4 -word_size 28 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst 10 # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
# explanation: nohost_blast will blast a subset (determined by "num_subset_seq") of your initial file to nt. it will get the taxid from the hits to nt and use these tax id to make a new database, which it will then blast the full initial file against
# -------------------
We talked this last time, and we clearly wrote how to setup "databases", you have to make symbolic link under your $HOME directory. We may change this later, but I designed the application this way. Her is the line from the man page:
ln -s /path/to/databases $HOME/databases
Dereje
Ok I will do this - but for 6 datasets I have run it this way and it's worked perfectly. So I don't think this gets at the original problem. But let me make the change and I'll rerun.
softlink made:
(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 ~]$ ls -l $HOME
total 20256
drwx------. 4 melanie.melendrez domain users 4096 Jun 4 2014 a5_miseq_linux_20140604
drwxrwxr-x. 2 melanie.melendrez domain users 4096 Dec 17 16:44 bin
-rw-------. 1 melanie.melendrez domain users 6732683 Sep 9 13:41 cStringIO
lrwxrwxrwx. 1 melanie.melendrez domain users 29 Jan 29 13:55 databases -> /media/VD_Research/databases/
drwx------. 9 melanie.melendrez domain users 12288 Jan 28 14:14 Desktop
drwx------. 2 melanie.melendrez domain users 4096 Sep 29 08:00 dist
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Oct 25 12:37 doc
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Oct 28 2013 Documents
drwx------. 5 melanie.melendrez domain users 4096 Aug 13 12:39 dodhpc
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Jan 29 08:31 Downloads
drwxr-xr-x. 8 melanie.melendrez domain users 4096 Dec 2 12:58 FastQC
drwx------. 3 melanie.melendrez domain users 4096 Oct 14 16:23 home
drwx------. 5 melanie.melendrez domain users 4096 Feb 5 2014 igv
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Sep 15 13:56 include
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Oct 25 12:32 lib
drwx------. 2 melanie.melendrez domain users 4096 Sep 15 13:14 man
-rw-rw-r--. 1 melanie.melendrez domain users 394681 Oct 1 2013 math
drwxr-x---. 34 melanie.melendrez domain users 4096 May 1 2014 melanie
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Music
drwxr-xr-x. 6 melanie.melendrez domain users 4096 Oct 25 13:09 myhome
drwxr-xr-x. 32 melanie.melendrez domain users 4096 Jan 29 10:44 ngs_mapper
-rw-------. 1 melanie.melendrez domain users 6732676 Sep 9 13:41 os
drwx------. 5 melanie.melendrez domain users 4096 Aug 22 14:58 pbs_scripts
drwxr-xr-x. 8 melanie.melendrez domain users 4096 Mar 20 2014 phylosift_v1.0.1
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Pictures
drwxr-xr-x. 3 melanie.melendrez domain users 4096 Dec 23 17:27 pipelinedump
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Public
drwx------. 8 melanie.melendrez domain users 4096 Sep 29 08:00 redsample
drwx------. 2 melanie.melendrez domain users 4096 Sep 29 08:00 redsample.egg-info
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Nov 28 14:08 share
drwx------. 13 melanie.melendrez domain users 4096 Dec 17 16:44 src
drwxr-xr-x. 3 melanie.melendrez domain users 4096 Oct 7 16:58 sshtunnel
-rw-rw-r--. 1 melanie.melendrez domain users 6732677 Sep 9 13:41 sys
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Templates
-rw-------. 1 melanie.melendrez domain users 355 Oct 7 17:45 tunnel.log
drwxr-xr-x. 13 melanie.melendrez domain users 4096 Dec 31 10:59 usamriidPathDiscov
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Videos
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Dec 14 01:57 ViQuaS1.3
I don't know why the application failed if you didn't make any change. I installed and run 80 samples for Jun, everytime, I run it worked.
just because you run 100's of samples for whoever doesn't mean it'll work perfectly on every dataset. every user and every dataset is different. so let me just rerun and lets see what happens. We'll see if softlinking fixes the issue.
stay tuned.
I am now editing my config.yaml to make sure it has
databases: ~/databases
and running...will update when it finishes so we know if the problem fixed. Last time it ended around 23 minutes when I expected it to run at least an hour - so we'll see.
Your symbolic link has "/" at the end, it shoudl be looks like this
"databases -> /media/VD_Research/databases" not "databases -> /media/VD_Research/databases/"
this is the command I ran:
(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 ~]$ ln -s /media/VD_Research/databases/ $HOME/databases
incorrect?
so I should get rid of the / after databases on both?
the link is blue - so it looks like it's active...but I can redo the link
cancelling run...redoing link
Try this: pushd $HOME unlink databases ln -s /media/VD_Research/databases databases popd
jus a sec...gotta kill some processes to make the sure the pipeline is stopped
Can you also paste the line you excute to run this task?
k ran your command above: now my files look like this:
Yes?
(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 Pathogen_Discovery]$ ll ~/
total 20256
drwx------. 4 melanie.melendrez domain users 4096 Jun 4 2014 a5_miseq_linux_20140604
drwxrwxr-x. 2 melanie.melendrez domain users 4096 Dec 17 16:44 bin
-rw-------. 1 melanie.melendrez domain users 6732683 Sep 9 13:41 cStringIO
lrwxrwxrwx. 1 melanie.melendrez domain users 28 Jan 29 14:09 databases -> /media/VD_Research/databases
drwx------. 9 melanie.melendrez domain users 12288 Jan 28 14:14 Desktop
drwx------. 2 melanie.melendrez domain users 4096 Sep 29 08:00 dist
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Oct 25 12:37 doc
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Oct 28 2013 Documents
drwx------. 5 melanie.melendrez domain users 4096 Aug 13 12:39 dodhpc
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Jan 29 08:31 Downloads
drwxr-xr-x. 8 melanie.melendrez domain users 4096 Dec 2 12:58 FastQC
drwx------. 3 melanie.melendrez domain users 4096 Oct 14 16:23 home
drwx------. 5 melanie.melendrez domain users 4096 Feb 5 2014 igv
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Sep 15 13:56 include
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Oct 25 12:32 lib
drwx------. 2 melanie.melendrez domain users 4096 Sep 15 13:14 man
-rw-rw-r--. 1 melanie.melendrez domain users 394681 Oct 1 2013 math
drwxr-x---. 34 melanie.melendrez domain users 4096 May 1 2014 melanie
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Music
drwxr-xr-x. 6 melanie.melendrez domain users 4096 Oct 25 13:09 myhome
drwxr-xr-x. 32 melanie.melendrez domain users 4096 Jan 29 10:44 ngs_mapper
-rw-------. 1 melanie.melendrez domain users 6732676 Sep 9 13:41 os
drwx------. 5 melanie.melendrez domain users 4096 Aug 22 14:58 pbs_scripts
drwxr-xr-x. 8 melanie.melendrez domain users 4096 Mar 20 2014 phylosift_v1.0.1
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Pictures
drwxr-xr-x. 3 melanie.melendrez domain users 4096 Dec 23 17:27 pipelinedump
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Public
drwx------. 8 melanie.melendrez domain users 4096 Sep 29 08:00 redsample
drwx------. 2 melanie.melendrez domain users 4096 Sep 29 08:00 redsample.egg-info
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Nov 28 14:08 share
drwx------. 13 melanie.melendrez domain users 4096 Dec 17 16:44 src
drwxr-xr-x. 3 melanie.melendrez domain users 4096 Oct 7 16:58 sshtunnel
-rw-rw-r--. 1 melanie.melendrez domain users 6732677 Sep 9 13:41 sys
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Templates
-rw-------. 1 melanie.melendrez domain users 355 Oct 7 17:45 tunnel.log
drwxr-xr-x. 13 melanie.melendrez domain users 4096 Dec 31 10:59 usamriidPathDiscov
drwxr-xr-x. 2 melanie.melendrez domain users 4096 Sep 13 2013 Videos
drwxr-xr-x. 4 melanie.melendrez domain users 4096 Dec 14 01:57 ViQuaS1.3
alright - rerunning
Paste the command line your run?
(usamriidPathDiscov)[melanie.melendrez@amedwrair15380 Pathogen_Discovery]$ usamriidPathDiscov_cli -R1 ../40_S7_L001_R1_001.fastq -R2 ../40_S7_L001_R2_001.fastq --outdir USAMRIID-2_40
Fine?
same command I used for the other datasets.
Can paste few lines from "USAMRIID-2_40/input/param.txt"? I want to see the database lines.
I didn't alter anything in param.txt but just a sec...let me bring it up
oh wait that doesn't generate til after I start the run right?
k lemme start it
here's the param.txt
# This is the parameter file for the pathogen discovery pipeline.
# Every command (or "module", if you like) has its own settings. The settings for a particular module must follow its "command module" line. Otherwise, the order doesn't matter.
# How to use this parameter file:
# for boolean options, use "yes" or "1" for assent OR "no" or "0" or "-" for dissent (or simply comment out or omit the line).
# the default settings are generally "no" unless otherwise specified
# note: make sure nt and nr are up-to-date!
# -------------------
command step1
seq_platform illumina # choices are: illumina or 454
# explanation: must be run first --- hence titled step1. this module maps the fastq IDs into simple numerical IDs and processes .sff file if 454
# -------------------
command quality_filter
cutadapt_options_R1 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g GCCGGAGCTCTGCAGATATC -a GATATCTGCAGAGCTCCGGC -m 50 --match-read-wildcards
cutadapt_options_R2 -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -g GCCGGAGCTCTGCAGATATC -a GATATCTGCAGAGCTCCGGC -m 50 --match-read-wildcards
# cutadapt_options -g GCCGGAGCTCTGCAGATATC -m 20 --match-read-wildcards # this should have every flag except the input and output ones
# cutadapt_options2 -g CGCCGTTTCCCAGTAGGTCTC -m 20 --match-read-wildcards # this should have every flag except the input and output ones
# prinseq_options -log -verbose -min_len 50 -ns_max_p 10 -derep 12345 # this should have every flag except the input and output ones (i.e., don't specify the "-out_good" "-out_bad" or "fastq" flags)
prinseq_options -min_len 50 -derep 14 -lc_method dust -lc_threshold 3 -trim_ns_left 1 -trim_ns_right 1 -trim_qual_right 15 # this should have every flag except the input and output ones (i.e., don't specify the "-out_good" "-out_bad" or "fastq" flags)
# explanation: run up to 2 iterations of cutadapt, if specified, and prinseq, if specified
# -------------------
command host_map
# for the "_list" settings, use a comma-delimited list with no spaces
mapper_program_list bowtie2,bowtie2 # choices are: bwa, bowtie2
mapper_db_list /media/VD_Research/databases/humandna/human_dna,/media/VD_Research/databases/humanrna/h_sapiens_rna
mapper_name_list bowtie2_genome_local,bowtie2_transcript_local # names that will appear on a graph
mapper_options_list --local,--local # flags for aligner
# explanation: perform a chain of alignments.
# -------------------
command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time
mapper_program_list bowtie2 # choices are: bwa, bowtie2
mapper_db_list /media/VD_Research/databases/humandna/human_dna # prefix of aligner indexed database
mapper_name_list bowtie2_genome # names that will appear on a graph
mapper_options_list # flags for aligner
# -------------------
# command host_map_2
# this is the same as host_map --- it just allows you to do it an n-th time
# mapper_program_list bwa # choices are: bwa, bowtie2
# mapper_db_list /media/VD_Research/databases/humandna/human_dna # prefix of aligner indexed database
# mapper_name_list bwa_genome # names that will appear on a graph
# mapper_options_list # flags for aligner
# -------------------
command iterative_blast_phylo
# for the "_list" settings, use a comma-delimited list with no spaces
blast_db_list /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list megablast,dc-megablast # options are: megablast dc-megablast blastn, blastx
# blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list 10,10 # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names /media/VD_Research/databases/ncbi/taxonomy/names.dmp # NCBI taxonomy names dump file
taxonomy_nodes /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp # NCBI taxonomy nodes dump file
# explanation: perform a chain of blasts. the blast is performed in chunks to speed up the process. note: if you use blastx, make sure the chunks are really small. otherwise, it takes a long time.
# -------------------
command iterative_blast_phylo_2
# this is the same as iterative_blast_phylo --- it just allows you to do it an n-th time
# for the "_list" settings, use a comma-delimited list with no spaces
blast_db_list /media/VD_Research/databases/ncbi/blast/nt/nt,/media/VD_Research/databases/ncbi/blast/nt/nt # blast db prefix
blast_task_list megablast,dc-megablast # options are: megablast dc-megablast blastn, blastx
# blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 -word_size 7 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are
blast_options_list -evalue 1e-4 -word_size 28,-evalue 1e-4 -word_size 12,-evalue 1e-4 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst_list 10,10 # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
taxonomy_names /media/VD_Research/databases/ncbi/taxonomy/names.dmp # NCBI taxonomy names dump file
taxonomy_nodes /media/VD_Research/databases/ncbi/taxonomy/nodes.dmp # NCBI taxonomy nodes dump file
# explanation: run another instance of the iterative_blast_phylo module. you can run more instances by adding entries in this parameter file for "iterative_blast_3", "iterative_blast_4", etc
# -------------------
command nohost_blast
ncbi_nt_db /media/VD_Research/databases/ncbi/blast/nt/nt # full path to NCBI nt database prefix
gi2taxid /media/VD_Research/databases/ncbi/blast/nt/nt/gi2taxid.txt # file with col1=gi number, col2=taxid (to make this file: blastdbcmd -db /data/db/nt -entry all -outfmt '%g %T' > gi2taxid.txt) eg. blastdbcmd -db /media/VD_Research/People/Dereje.Jima/databases/ncbi/blast/nt/nt -entry all -outfmt '%g %T' > gi2taxid.txt
num_subset_seq 200 # the number of sequences in the small initial file to be blasted
blast_type blastn # options are: blastn blastx
blast_task megablast # options are: megablast dc-megablast blastn
blast_options -evalue 1e-4 -word_size 28 # blast options (except for: -task -query -db -out -outfmt -num_descriptions; these are hardwired)
ninst 10 # the input file will be broken into chunks and blasted in parallel - this parameter is the number of instances of BLAST you want to run in parallel
# explanation: nohost_blast will blast a subset (determined by "num_subset_seq") of your initial file to nt. it will get the taxid from the hits to nt and use these tax id to make a new database, which it will then blast the full initial file against
# -------------------
command ray2_assembly
kmer 25 # assembler k-mer
ninst 10 # number of instances for mpiexec
cap 1 # use cap after ray
# cap_options # cap options
map2contigs yes # if "yes" or "1", map reads back onto assembly
bowtie2_options --local # only nec if map2contigs. Options for the mapper
# explanation: perform an assembly using Ray
# -------------------
command ray2_assembly_2
# this is the same as ray2_assembly --- it just allows you to do it an n-th time
kmer 25 # assembler k-mer
ninst 10 # number of instances for mpiexec
cap 1 # use cap after ray
# cap_options # cap options
map2contigs yes # if "yes" or "1", map reads back onto assembly
bowtie2_options --local # only nec if map2contigs. Options for the mapper
# -------------------
command orf_filter
# filter fasta input by orf
getorf_options -minsize 60 -find 0 # any options other than "-sequence" and "-outseq"
Still not changed, did you edit "config.yaml" and then "python setup install"
cancelling run again...
nope - didn't know I had to do that.
I edited config.yaml
# database location
# Upadate all database locations
# Point databases to the location of your databases
databases: ~/databases
# These will all have the value of databases joined to them
# when the pipeline runs
human_dna: humandna/human_dna
human_rna: humanrna/h_sapiens_rna
nt_db: ncbi/blast/nt/nt
tax_nodes: ncbi/taxonomy/nodes.dmp
tax_names: ncbi/taxonomy/names.dmp
PHRED_OFFSET: 33
SEQUENCE_PLATFORM: illumina #choices are: illumina,454
NODE_NUM: 10 # Number of computer nodes or CPUS
#paired end lane to run
blast_unassembled: 1000
didn't know I had to rerun python setup install... let me do that
ran
python setup.py install
seemed to work fine.
rerunning USAMRIID-2_40 again
Start it now and check the "param.txt" file, it should have "$HOME/databases...."
# -------------------
command host_map
# for the "_list" settings, use a comma-delimited list with no spaces
mapper_program_list bowtie2,bowtie2 # choices are: bwa, bowtie2
mapper_db_list /home/AMED/melanie.melendrez/databases/humandna/human_dna,/home/AMED/melanie.melendrez/databases/humanrna/h_sapiens_rna
mapper_name_list bowtie2_genome_local,bowtie2_transcript_local # names that will appear on a graph
mapper_options_list --local,--local # flags for aligner
# explanation: perform a chain of alignments.
# ------------------
that looks right...right? It doesn't say $HOME but $HOME is the same as /home/AMED/melanie.melendrez/databases...
yes
k I'll let you know when it finishes.
run finished - make_summary and make_pie ran too. closing issue.
Just as a side note out of curiosity last night I also reran the pipeline by replacing the softlink with the path to the databases onsite here /media/VD_Research/databases in the config.yaml. I then reran python setup.py install and reran the pipeline and it finished in 3:11:20.
So for future notice - Both methods work, whether you use the direct path to the databases or you use the softlink. So the error I encountered didn't have to do with the databases.
I did notice my terminal was acting funny throughout the day and occasionally I had some conflicting processes going on that I had to kill and might have interfered with that original run that err'd out.
Dereje mentioned he would make the getorf Warning more informative in the future so we can pinpoint a little easier where the problem occurred when we get warnings or errors from the pipeline.
I got the above Warning and the pipeline only ran in 23 minutes which is unusual given the other datasets I've run. Suggestions?