PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 103 forks source link

too long time running run_filter_stage2 #372

Open tangerzhang opened 8 years ago

tangerzhang commented 8 years ago

Hello, I am working on a plant genome pacbio assembly and I got 52 X corrected reads. When feeding these preads to FALCON assembly, it took me more than two days running run_filter_stage2 and has not finished right now. I checked the las.fofn file, which contains 323036 lines. I assume that the long running time is caused by so many las files? Is that normal? Any suggestions? Thanks a lot!

###My configure file looks like:
[General]
input_fofn = preads.fofn
input_type = preads
length_cutoff = 10000
length_cutoff_pr = 9000 
sge_option_da = -pe orte 8 -q all.q
sge_option_la = -pe orte 8 -q all.q
sge_option_pda = -pe orte 8 -q all.q
sge_option_pla = -pe orte 8 -q all.q
sge_option_fc = -pe orte 8 -q all.q
sge_option_cns = -pe orte 8 -q all.q
pa_concurrent_jobs = 60
cns_concurrent_jobs = 60
ovlp_concurrent_jobs = 60
pa_HPCdaligner_option =  -v -dal4 -t16 -e.70 -l1000 -s1000  
ovlp_HPCdaligner_option = -l4800 -k18 -h480 -w8 -H15000 -M32
pa_DBsplit_option = -x200 -s50
ovlp_DBsplit_option = -x200 -s50
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 3  --max_n_read 200 --n_core 6 
overlap_filtering_setting = --max_diff 100 --max_cov 80 --min_cov 2 --bestn 10 --n_core 24
pb-jchin commented 8 years ago

yes. you need -dal option on the ovlp_HPCdaligner_option parameters. You have way to many smaller las files for the filter to go through. The excessive shell processes probably is the culprit of the slowness. Try "-dal128" (in newer version "-B128") to reduce the final number of merged files in the final overlapping stage. I typically watch how many merge jobs will be there by examining the 1-preads_ovl/run_jobs.sh

pb-jchin commented 8 years ago

Another note, if you have already get many many small las files, you could manually merge them and ask fc_ovlp_filter.py to take the merged las files as input. However, you have to make sure you don't redundant entries in the merged files.

tangerzhang commented 8 years ago

Thanks Jason. I have re-sumbited the job with -dal128. I will see the results. That would take too long.

2016-05-24 10:41 GMT+08:00 Jason Chin notifications@github.com:

Another note, if you have already get many many small las files, you could manually merge them and ask fc_ovlp_filter.py to take the merged las files as input. However, you have to make sure you don't redundant entries in the merged files.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/372#issuecomment-221151886

tangerzhang commented 8 years ago

Hi Jason, I tried -B128 but still have the same problem. I think it might be a bug after I updating the latest falcon release. My previous run (successful case) in which I used falcon v0.4 generate a las.fofn file contain only preads.*.las. The context of las.fofn is attached below:

/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.62.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.73.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.104.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.63.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.132.las

However, the failure one (latest falcon release) generated a las.fofn file which contains all las file, including L1.*.las, L2.*.las and preads.*.las. Part of the file were attached below:

/home/zhangxt/project/LgSXasm/try_corOutCoverage80/falcon_t1/1-preads_ovl/m_00001/L1.1.114.las
/home/zhangxt/project/LgSXasm/try_corOutCoverage80/falcon_t1/1-preads_ovl/m_00001/L1.1.207.las
...

Is this a bug or anything I did wrong? I can only use preads.*las right now but I would like to know what cause this problem. I could avoid this in the future. Thanks!

pb-jchin commented 8 years ago

yes. it is a bug. I submitted a PR already. see https://github.com/PacificBiosciences/FALCON/pull/367

pb-cdunn commented 8 years ago

Could you tell us what commit you are using? git rev-parse HEAD. Did you simply download the latest release. I am about to issue a new release with the fix.

The good news is that you will not need to re-run everything. After updating FALCON (the tip of master is fine), simply:

rm -rf 2-*/
rm -rf 1-*/

And restart. Stage-0 should be fine.