PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries
Other
205 stars 102 forks source link

falcon liver cancer, #26

Closed AnWD closed 9 years ago

AnWD commented 9 years ago

hi jchin,when I falcon liver cancer,the sge_log/falcon-793b5798.o12743 display:"UnboundLocalError: local variable 'overlap_data' referenced before assignment" Is there anything wrong with my fc_run.cfg ? or should I install HBAR-DTK for large genome assembly? thank you

fc_run.cfg:
[General]
# list of files of the initial bas.h5 files
input_fofn = input.fofn
#input_fofn = preads.fofn

input_type = raw
#input_type = preads

# The length cutoff used for seed reads used for initial mapping
length_cutoff = 15000

# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 15000

sge_option_da = -pe openmp 8 -q all.q
sge_option_la = -pe openmp 2 -q all.q
#6 seems to small... 8 might be better for Dmel
sge_option_pda = -pe openmp 8 -q all.q
sge_option_pla = -pe openmp 2 -q all.q
sge_option_fc = -pe openmp 23 -q all.q
sge_option_cns = -pe openmp 8 -q all.q

pa_concurrent_jobs = 32
ovlp_concurrent_jobs = 32

pa_HPCdaligner_option =  -v -dal128 -t16 -e.70 -l1000 -s1000
ovlp_HPCdaligner_option = -v -dal128 -t32 -h60 -e.96 -l500 -s1000

pa_DBsplit_option = -x500 -s400
ovlp_DBsplit_option = -x500 -s400

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --local_match_count_threshold 2 --max_n_read 200 --n_core 6

overlap_filtering_setting = --max_diff 100 --max_cov 100 --min_cov 1 --bestn 10
pb-jchin commented 9 years ago

Please review some basic theory http://en.wikipedia.org/wiki/DNA_sequencing_theory If you only have 3x human data, I don't think any code can magically find good enough overlaps to create an assembly. By the way, theory just provide lower bound for the requirement. The real world is more complicated than the simple theory. Depending on your biological question, you may or may not need an assembly. There is still a lot of useful information in low coverage data but it is not this code designed for. You will need to be creative, do some research and develop some other bioinformatics processes to get useful information out.

This is not a software problem although it might be better that the code handle some exceptions better.

pb-jchin commented 9 years ago

You might want to check out an example cancer sequencing project from the last talk in this video https://www.youtube.com/watch?v=hYiyPsHuADQ