BioInf-Wuerzburg / proovread

PacBio hybrid error correction through iterative short read consensus
MIT License
60 stars 20 forks source link

the result problem of proovread (the size of generated file is different) #152

Closed Hulanyue closed 4 years ago

Hulanyue commented 4 years ago

Hi @thackl ,there are some problems,I hope can get some advice!

1.I run proovread -l pb-1.fq -s s_6_1.fastq [-u unitigs.fa] --pre pb-1 two times ,and six files was generated,however,the size of files is different

the firest time :LR :221.3MB SR:3.3G pb-1.trimmed.fq 2.2MB pb-1.trimmed.fa 1.1MB the second time:LR :221.3MB SR:3.3G pb-2.trimmed.fq 527.8kB pb-1.trimmed.fa 267.8kB

I don't konw why the size is different and why the size of generated file is so small ,the result of mask in the second time is as follows: [Sun Nov 17 20:19:01 2019] Running mode: sr [Sun Nov 17 20:27:19 2019] Running task bwa-sr-1 [Sun Nov 17 21:17:03 2019] Masked : 69.1% [Sun Nov 17 21:17:03 2019] Running task bwa-sr-2 [Sun Nov 17 21:51:21 2019] Masked : 82.4% [Sun Nov 17 21:51:21 2019] Running task bwa-sr-3 [Sun Nov 17 22:32:14 2019] Masked : 85.5% [Sun Nov 17 22:32:14 2019] Running task bwa-sr-4 [Sun Nov 17 23:12:03 2019] Masked : 94.9% [Sun Nov 17 23:12:04 2019] Running task bwa-sr-finish [Sun Nov 17 23:54:45 2019] Masked : 86.3%

2. According to

proovread: large-scale high accuracy PacBio correction through iterative short read consensus. Hackl, T.; Hedrich, R.; Schultz, J.; Foerster, F. (2014).

I downloaded E.coli K 12 MG1655 Resequencing from PacBio DevNet ,and used ~/data/filtered_subreads.fastq as LR

I download ERR008613(SAR ERX 002508) from NCBI,there are two .fastqfiles(s_6_1.fastq,s_6_2.fastq),and used s_6_1.fastq as SR

I download NC_000913.3 (Assembly GCA_000005845.1) as reference

I don't sure whether the data is same with yours ,and if the data is different from yours ,can you give me your experimental data ?

Thank you very much !!!

Hulanyue commented 4 years ago

Hi,I think I have know why the size of files is so small,please see generated .log file, there is a error:

[Sun Nov 17 23:54:45 2019] Quality trimming and siamaera filtering raw output
#------------------------------------------------------------------------------#
perl -I/home/workstation/biosoft/proovread/bin/../lib/ /home/workstation/biosof\
t/proovread/bin/SeqFilter --in pb-2/pb-2.untrimmed.fq --min-length 500 --phred-\
offset 33 --substr pb-2/pb-2.chim.tsv --trim-win 12,5 --out - | perl -I/home/wo\
rkstation/biosoft/proovread/bin/../lib/ /home/workstation/biosoft/proovread/bin\
/siamaera > pb-2/pb-2.trimmed.fq
[23:54:45] /home/workstation/biosoft/proovread/bin/SeqFilter-1.06
[23:54:45] --in: pb-2/pb-2.untrimmed.fq
[23:54:45] Detected FASTQ format, phred-offset 33
[23:54:45] --substr-file: pb-2/pb-2.chim.tsv
    0  [                                                          ]BLAST engine error: Warning: Sequence contains no data 
[19-11-17 23:54:47] [siamaera] Blast exited with error: 768 

and I run:

workstation@workstation:~/biosoft/proovread$ perl -I/home/workstation/biosoft/proovread/bin/../lib/ /home/workstation/biosof\t/proovread/bin/SeqFilter --in pb-2/pb-2.untrimmed.fq --min-length 500 --phred-offset 33 --substr pb-2/pb-2.chim.tsv --trim-win 12,5 --out - | perl -I/home/workstation/biosoft/proovread/bin/../lib/ /home/workstation/biosoft/proovread/bin/siamaera > pb-2/pb-2.trimmed.fq

it show me the error:

[20:22:42] /home/workstation/biosoft/proovread/bin/SeqFilter-1.06
[20:22:42] --in: pb-2/pb-2.untrimmed.fq
[20:22:42] Detected FASTQ format, phred-offset 33
[20:22:42] --substr-file: pb-2/pb-2.chim.tsv
 3.08M [=>                                                        ] ETA 00:09:20Warning: [blastn] Subject_1 m120131_103014_sidney_c100278822550000001523007907041295_s1_p0/1536/0_4016: Subject sequence contians no data
 24.5M [========>                                                 ] ETA 00:07:27BLAST engine error: Warning: Sequence contains no data 
[19-11-18 20:24:01] [siamaera] Blast exited with error: 768 

and when I run again,it show me:

[20:25:46] /home/workstation/biosoft/proovread/bin/SeqFilter-1.06
[20:25:46] --in: pb-2/pb-2.untrimmed.fq
[20:25:46] Detected FASTQ format, phred-offset 33
[20:25:46] --substr-file: pb-2/pb-2.chim.tsv
 18.4M [======>                                                   ] ETA 00:07:47BLAST engine error: Warning: Sequence contains no data 
[19-11-18 20:26:46] [siamaera] Blast exited with error: 768 

Can you give me some advice ? Thank you!

thackl commented 4 years ago

Hi, sorry about the very late reply. And I don't even have good news. As you suspected, something went wrong with the blast run. A few other users have reported a similar issue. However, I was never able to reproduce it, nor to figure out what causes it. As quite nicely illustrated by your case, even with identical data, that issues only sometimes appears. And I don't know how to fix it other than rerunning proovread.