Closed bpfeffer closed 9 years ago
Did you set up your config file to run diamond?
Since diamond is essentially a drop in replacement for blastx not a ton changes, but there are 3 places you can check to verify that it actually ran
input/param.txt
Make sure you see diamond in the blast_tax_list. Example:
blast_task_list megablast,diamond
You can look in interative_blast_phylo/logs/*.o
You can search through the file for diamond blastx
where you should see a few lines where it was run
iterative_blast_phylo_1/tmp_contig_X and/or interative_blast_phylo2/tmp[R1|R2]_X
In the above example you would expect the diamond temporary results to be in
tmp_[R1|R2]_2 and/or tmp_contig_2
Note: It is in _2 because diamond is the 2nd of 2 steps. If you have other steps it may not be 2
So I tried running:
command iterative_blast_phylo
# for the "_list" settings, use a comma-delimited list with no spaces
blast_db_list /data/db/nt,/data/db/nt #,/data/Scripts/ref/diamond/diamondnr blast db prefix
blast_task_list megablast,dc-megablast,diamond # options are: megablast dc-megablast blastn, diamond
And used --noparam after setting that, it eventually changed it to remove the diamond that I added. Do I also need to adjust the db path so it's not commented out?
Ya, you will have to remove the #
part of the blast_db_list line and place the #
before the comment at the end or just remove that comment
End result should look like
command iterative_blast_phylo
# for the "_list" settings, use a comma-delimited list with no spaces
blast_db_list /data/db/nt,/data/db/nt,/data/Scripts/ref/diamond/diamondnr #blast db prefix
blast_task_list megablast,dc-megablast,diamond # options are: megablast dc-megablast blastn, diamond
Ok, made those adjustments and it finished, but Iterative_phylo_blast_1 folder is empty(except for an empty logs folder) and the symbolic linking step failed because of that. The overall output folder has only: sample.contig.count.txt, sample.R1.count.txt sample.R2.count.txt, qstats.txt, reads.txt, stats.txt
Seems something definitely is wrong.
After you modified the param.txt
did you remove all the results directories that may have been created from the previous run(if there were any)
I'm trying to think of why the folder would be completely empty. Usually if there is an issue at least the log files are generated which makes me think that the command didn't run at all or something.
Can you also look in the analysis.log and see if there is anything to indicate an error in there
Right, I had an empty folder except for param.txt that I edited, and used --noparam.
Here is the part that seems relevant in analysis.log:
[module] iterative_blast_phylo
[iteration] 1
[cmd] mkdir -p /data/bpfeffer/Contam/6_1_15_wrairtest/624_S1_diamond/results/iterative_blast_phylo_1/logs
[cmd] /data/Scripts/pathdiscov/pathdiscov/iterative_blast_phylo/iterative_blast_phylo.pl --sample 624_S1_diamond --paramfile 6_1_15_wrairtest/624_S1_diamond/input/param.txt --outputdir /data/bpfeffer/Contam/6_1_15_wrairtest/624_S1_diamond/results/iterative_blast_phylo_1 --logs /data/bpfeffer/Contam/6_1_15_wrairtest/624_S1_diamond/results/iterative_blast_phylo_1/logs --timestamp 20150602-11.56 --R1 /data/bpfeffer/Contam/6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/ray2_assembly_1.fasta --fastafile yes --run_iteration 1 --contig 1
[error] blast param lists different sizes at /data/Scripts/pathdiscov/pathdiscov/iterative_blast_phylo/iterative_blast_phylo.pl line 124.
[deltat] 0
[iterative_blast_phylo_1] Output: out_r1: /data/bpfeffer/Contam/6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/1.R1.unmap.fastq -- out_r2: /data/bpfeffer/Contam/6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/1.R2.unmap.fastq -- out_contig: /data/bpfeffer/Contam/6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/ray2_assembly_1.fasta
[END] 20150602-13.27
[DELTAT] 5453
Using top 4000 from unmapped read files because --blast_unassembled set
[cmd] cat 6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/1.R1.unmap.fastq | head -4000 > 6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/head.1.R1.unmap.fastq
cat: write error: Broken pipe
[deltat] 0
[cmd] cat 6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/1.R2.unmap.fastq | head -4000 > 6_1_15_wrairtest/624_S1_diamond/results/ray2_assembly_1/head.1.R2.unmap.fastq
cat: write error: Broken pipe
[deltat] 0
[error] blast param lists different sizes at /data/Scripts/pathdiscov/pathdiscov/iterative_blast_phylo/iterative_blast_phylo.pl line 124.
All the items in your param.txt under iteratvie_blast_phylo have to have the same number of comma separated items
It is probably the blast_options_list
and ninst_list
that are missing that 3rd option
Are those options comparable to the ones listed in the first two options? They do both only have 2 options right now, so I need to figure out what to put for the third options, if it's application specific or just an internal setting for pathdiscov.
The blast_options_list
is a comma separated list of options that are passed directly to blast/diamond
If you used pathdiscov_cli --param
to generate the param.txt there should be a default diamond option line after the comment #
If not you can use
--compress 0 -p 0 -v -k 10 -w 28 --id 0.7 -c 4 -t TEMPDIR
Replacing TMPDIR with the path to your temporary directory for diamond to use
If you have a ton of ram, you can use /dev/shm
which is essentially your RAM
for ninst_list
just have the same number 3 times such as
ninst_list 5,5,5
I made those changes(including using /dev/shm as a tempdir since we have 1TB on that server) and it ran the pipeline and eventually modified param.txt to remove the changes to those 4 lines back to just 2 blast searches.
So the pipeline changed your param.txt
even though you did --noparam
?
Right, here was my command:
nohup pathdiscov_cli --noparam --outdir 624_S1_diamond --R1 06-24_S1_L001_R1_001.fastq.gz --R2 06-24_S1_L001_R2_001.fastq.gz -c 60
I'm going to run the test dataset here and see what happens
$> pathdiscov_cli --outdir 624_S1_diamond --R1 testData/F.fastq.gz --R2 testData/R.fastq.gz -c 4 --param
$> wget https://gist.githubusercontent.com/necrolyte2/64f3ba20645f589ee2aa/raw/1e1f60415ad5fd1dcf9270f6e9553d291976be11/param.txt-diamond -O 624_S1_diamond/input/param.txt
$> pathdiscov_cli --noparam --outdir 624_S1_diamond --R1 testData/F.fastq.gz --R2 testData/R.fastq.gz -c 4 --noparam
It didn't change the param.txt
on my end when I ran it as above
Is it maybe the placement of --noparam since you put it at the start and the end?
In theory it should not matter as we are using the argparse module
What you should be able to do is redo the run with a different --outdir
and after you start the pipeline open a new terminal and check to see if it changed param.txt
as if it is changing it it should happen within the first minute of running the pipeline
Were you able to run the testData set yet?
Both the testdata sample and my local sample did finish and ran diamond properly this time, but I notice the phylo_1 error log has "ibgomp: Thread creation failed: Resource temporarily unavailable Thread creation failed: Resource temporarily unavailable" so that's new. Quick research seems to point to stack size, does that seem right? Our current is set to "stack size (kbytes, -s) 10240"
What I think is happening is that since you are specifying -c 60
too many additional threads are being spawned.
So essentially -c 60
should say, you can divide certain aspects into 60 parts so when it gets to iterative_blast_phylo_1
it will split the input files for each blast step into 60 chunks and then run that blast step on them in parallel.
The issue most likely is that --compress 0 -p 0 -v -k 10 -w 28 --id 0.7 -c 4 -t ~/tmp
is the default in our config for diamond
. The -p 0
says, automatically determine how many cpus to use which is generally whatever this would return cat /proc/cpuinfo | grep 'processor' | wc -l
Also, Linux treats Intel's hyperthreading and whatever AMD's equivilent as additional CPU's so for example on our i7 4 core cpus, that command returns 8
instead of 4
. So don't be surprised if you have 60
cpus if it returns 120
So lets say that command returns 60
. What you end up with is iterative_blast_phylo_*
spawning 60*60
threads or 3600
threads.
Poking around on my system I got my system limits
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 126685
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Here I see my max user processes
is set to 1024(not sure if that was imposed by DISA Stig or if that is default, but if that is less than 3600 on your system then most likely the cause of those errors.
I suspect that the analysis worked fine and just did not spawn as many threads as it wanted to, but kept on going.
The solution is for us to fix the sample.param
file that is shipped and set -p 1
for diamond. I'd be open for a small debate on if we should recommend people set -p
to 1 or some other number or even just have the pipeline enforce -p 1
even if the user were to supply -p 0
.
Can you rerun that analysis but set -p 1
in your param.txt
just so we can verify. Might even run faster as the system isn't overloaded with so many threads.
Ah, I can see why that would be a problem, the system has 160 threads(80 cores at 2 threads per) so that would certainly have caused an issue. Rerunning it now with the -p set to 1.
I ran 3 samples at the same time, the only differences were one sample was in another folder and it was with more cores(60 instead of 10 for each of the other two).
Single 60-thread phylo_blast_1 error log was full of these:
Error: the argument ('value') for option '--evalue' is invalid
cat: blastout*: No such file or directory
The two other samples had:
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1068963805_0.tmp
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1068963805_0.tmp (1/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1068963805_0.tmp (2/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1068963805_0.tmp (1/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1068963805_0.tmp (1/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1068963805_1.tmp (16/0)
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1068963805_0.tmp
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1068963805_0.tmp
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1068963805_0.tmp
Error: Entry not found in BLAST database
Error: Entry not found in BLAST database
Error: Entry not found in BLAST database
and
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1708646615_0.tmp (101/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1708646615_0.tmp (42/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1708646615_0.tmp (26/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1708646615_0.tmp (25/0)
Error: Error reading buffer file /data/pathdiscov_tmp/diamond_1708646615_2.tmp (2/0)
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1708646615_0.tmp
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1708646615_0.tmp
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1708646615_0.tmp
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1708646615_0.tmp
Error: function Input_stream::Input_stream(const string&, bool) line 74. Error opening file /data/pathdiscov_tmp/diamond_1708646615_0.tmp
Error: the argument ('value') for option '--evalue' is invalid
The evalue
option for blastn and diamond both are pulled from param.txt
That makes it seem like maybe you had a misconfiguration in your param.txt
for that sample which is why blast/diamond did not run to create the blastout files
I'm not sure why it was decided for the pipeline to split standard error and standard output into two files like this to put into logs. It makes it so difficult to figure out where the issues come from. I can only make assumptions that the buffer errors are being spit out by diamond
Can you give any insight about /data/pathdiscovtmp That filesystem didn't fill up did it? Are there other diamond files in there or is it void of `diamond*.tmp`
You could try removing the -t /data/pathdiscov_tmp
from your param.txt. This will tell diamond to just use RAM to do it's temp file work. Since you have 1TB of RAM it should not be an issue.
I found the error in the first sample param.txt so that's fine. I reran the second sample using -p 1 and without the -t folder so it would use ram, it still took a while(8+ hours). I then adjusted the getorf(at mike's recommendation) from 60 to 300 and got it down to a much faster time(80 minutes).
The question now is, are any of the results from diamond reflected in the output folder results? I wasn't able to find anything different there than the run without diamond, the only changes were some extra files and a folder in blast_phylo_1 folder.
You should not see any difference in output files as diamond is made to be an almost 100% drop in replacement The only way to get a good gut feeling that it actually run is by inspecting iterative_blast_phylo_/logs/.o
In there you should see that it ran diamond.
Example from one of our test runs:
diamond blastx --compress 0 -p 0 -v -k 10 -w 28 --id 0.7 -c 4 -t ~/tmp -q tmpsplit000 -d /home/AMED/tyghe.vallard/databases/diamond/diamondnr -o blastout000
Then since I know from our param.txt that diamond was the 3rd iteration
blast_task_list megablast,dc-megablast,diamond
I can look in tmp_contig_3 and find blastout000 where diamond dumped the output
@bpfeffer where you ever able to verify that diamond ran.
I realize now that you should also find a directory called orf_filter
inside of one of the tmp_contig_3 directory
In this directory you should see
I did verify that diamond ran, but it seems as though setting the getorf to 300 caused it to not have anything useful. The previous version I ran with the default 60 did have some results and the aux report for 60 did have some results that were from diamond.
I'm not seeing the output from diamond, but I could be overlooking it. What should I be looking for to ensure it went through that step properly?