PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

question about fc_ovlp_filter #357

Open lijingjing1 opened 8 years ago

lijingjing1 commented 8 years ago

i run fc_ovlp_filter on clusters it take me a long time to wait and the log did not update more than 24 hours. I have corrected las file 214 in my 1-preads_ovl/las_files.I check the ovlp filter log i find the log update right at starting run_filter_stage1 214 times,but finished run_filter_stage1 only appear 148 times at finished run_filter_stage1('1-preads_ovl/las_files/preads.163.las', 50, 70, 1, 6000).and i try rerun the run_falcon_asm.sh it same to be the same problem.

pb-jchin commented 8 years ago

Can you try to see if you run LA4Faclon -mo preads.db preads.1.las | less in 1-preads_ovl to see if you see any results?

lijingjing1 commented 8 years ago

I just try run LA4Falcon -mo preads.db preads.72.las|less in 1-preads_ovl,it give me nothing and also LAshow. but the 1-preads_ovl/m_00072/preads.72.las is not empty and also have m_00072_done

pb-jchin commented 8 years ago

I think you might have corrupted *.las files. I know there is a command from dalginer package called "LAcheck`. I have not use it myself. You might want to check where you can re-run the las merge step to ensure the files are not corrupted.

lijingjing1 commented 8 years ago

I got the idea .you mean my las files corrupted when LAmerge the sub las files.Then what i need to do is to LAcheck which preads.las is corrupted and reRun the jobs to generate new sub las files.

lijingjing1 commented 8 years ago

hi jchin,this time i rerun all the 1-preads_ovl jobs,it give me the same result that run fc_ovlp_filter waiting a very long time and the log do not update.This time i check the preads.las files by LA4Falcon -mo preads.las it show me the align informations overlap or contains,and each end line just ok.Can you have any suggestion about my question?

pb-jchin commented 8 years ago

I don't have enough information. do you check if you LA4Falcon are called and running when after fc_ovlp_filer starting?

lijingjing1 commented 8 years ago

yes,I check at the beginning i qsub the run_falcon_asm.sh the LA4Falcon called when fc_ovlp_filter started.But i wait about 6 hours login the node i find just fc_ovlp_filter is sleeping stat and have no LA4Falcon running.and then i find a interesting things no matter how many times i re run the run_falcon_asm.sh the log just stop "finished run_filter_stage1(‘1-preads_ovl/preads.db',/1-preads_ovl/las_files/preads.163.las', 50, 70, 1, 6000)",each time just stop finished run_filter_stage1 preads.163.las files.

pb-cdunn commented 8 years ago

It could be virtual-memory swapping, but let's try to figure out exactly where it stalls. You don't see preparing filter_stage2? What is maxrss immediately before finished run_filter_stage1? How many times do you see finished run_filter_stage1? Which .las files are finished, or yet to be processed? (In other words, are there any starting without corresponding finished? That could be where it has stalled.)

What are the sizes of your *.las? (Min, max, approx. average)

What is the size of 1-preads_ovl/.preads.bps?

How many processors on your machine? How much system RAM? What is --n_core for fc_ovlp_filter? You could try re-running with n_core=0, which runs serially and without extra processes so might be easier to debug.

lijingjing1 commented 8 years ago

yes,i don't see preparing filter_stage2,I have 214 preads.*.las,and have starting run_filter_stage1 214 times,but only have 148/157/154 finished run_filter_stage1 because i re run it three times,the last finished run_filter_stage1 files always is preads.163.las that have (214-148/157/154) have not finished.

the preads.*.las is about 1.2 Gb max 1.4 Gb min 1.1 Gb average about 1.2 Gb,and the .preads.bps is about 14 Gb.

i have 24 cpus and 64 Gb RAM in my machine. I set --n_core 24 to fully use the cpu for fc_ovlp_filter.

I just try setting ncore = 0 to run preads..las files one by one and debug to find which preads._.las file is stopped.

I test LA4Falcon -mo preads.db preads..las > preads..las.txt to look which las is bad,and all the preads.*.las.txt have daligner informations overlap or contains about size 1.5 Gb .It seem to every las file is ok.

pb-cdunn commented 8 years ago

If it finishes -- eventually -- with --n_core=0, but it never finishes with --n_core=24, then try --n_core=12. I don't know what's going on with your system, but it sounds like virtual-memory swapping. There must be a degeneracy in one of the slow .las files, which for some reason uses lots of memory. You could run top on your Linux machine to get some idea if that's the case.

lijingjing1 commented 8 years ago

ok,i am goning to run top to check .las files which las files is degeneracy or use lots of memory .Thank you cdunn very much.

lijingjing1 commented 8 years ago

Hi cdunn,i am so happy to tell you that the fc_ovlp_filter almost finished with --n_core=0.thank you suggestion cdunn.

It is running run_filter_stage3 that have never seen.I have 214 las files and finished run_filter_stage total 580 just remain 642-580=62 las files waiting for finished and the program is still running.

The problem may be the CPUS limit or RAM for large data,i will try --n_core=12 later and check.

lijingjing1 commented 8 years ago

Hi cdunn,i have a idea about las files in las.fofn when running fc_ovlp_filter .

If i have a very huge genome for example about 20 Gb,that means my preads.*.las files counts and size may also very big. Can i split the preads.*.las files in las.fofn to sub las.fofn and running fc_over_filter parallel with sub las.fofn to generate sub preads.ovl . And then cat the sub preads.ovl to generate final preads.ovl and OLC to assemble the genome. I just have a test but the final result have some different.

pb-cdunn commented 8 years ago

Ok. Let's be clear about this. We use something like 2-asm-falcon/run_falcon_asm.sub.sh to do all of stage-2 on a single multi-core machine. The first line is something like

# Given, las.fofn,
# write preads.ovl:
time fc_ovlp_filter --db ../1-preads_ovl/preads.db --fofn las.fofn --max_diff 1000 --max_cov 100000 --min_cov 0 --bestn 1000 --n_core 4 --min_len 50 >| preads.ovl

You're suggesting that we perform that on multiple machines when the number of las files is large. The number of las files is the number of blocks in stage-1, where we find overlaps of the correct reads.

Yes, we could sub-divide and parallelize that line across multiple machines. We could concatenate the resulting preads.ovl. Assuming our greedy filter reduces the data dramatically, the following steps could still be on a single machine, which is lucky because the rest would be very difficult to parallelize.

We could do that without much effort. But we need to choose the number of machines based on the current environment. Here is yours:

- sizes of las files
  1.4 GB max
- number of las files (number of blocks in stage-1)
  642
- available memory
  64GB
- number of processors
  24

But you neglected to answer my earlier question, which is critical. Actually, let's look at few more bits:

We have been concentrating on Human-scale, which is an order of magnitude smaller, and we use beefy hardware. You really need more RAM. Still, we might be able to help with this. The issue here is that when LA4Falcon processes a given las-file, it will access many parts of the DB, since the overlapping reads will in general be widely separated in the input fasta. Hmmm... Actually, for stage-1 each block has some data locality because its corrected reads were written to its own out.*.fasta. So when we operate on multiple blocks, we are pulling proportionally more data into memory. We need to know how many blocks at a time can be run through fc_ovlp_filter. You're saying that 24 is probably too many. (There is probably disk-swapping, which is what we need to avoid.) What about 12? And how many hours did it take with n_core=0? (That's really 1, but without the overhead of the multiprocessing module.) It should scale almost linearly, so if 1 takes 24 hours, then 12 should take about 2 hours, as long as there is no disk-swapping. In that case, you don't urgently need a multi-machine solution.