Open yilunhuangyue opened 8 years ago
There is no universal parameters for all genome (yet?) Anyway, depending on the read length distribution and the complexity of the repeats in the genome, there will be different choice to optimize for making the assembly. If you get >20x read > 10kb, I would suggest using length cutoff ~10kb. If not, you will need to include more reads.
Here is a parameter set I use to assemble an 1Gb plant genome for you reference. It won't work if you only copy and paste for you, but I hope it gets you to start.
[General]
# list of files of the initial bas.h5 files
input_fofn = input.fofn
#input_fofn = preads.fofn
input_type = raw
#input_type = preads
# The length cutoff used for seed reads used for initial mapping
length_cutoff = 10000
# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 10000
#you need to change these distributed computation related parameters that fit to your computation cluster configuration
sge_option_da = -pe smp 4 -q bigmem
sge_option_la = -pe smp 20 -q bigmem
sge_option_pda = -pe smp 6 -q bigmem
sge_option_pla = -pe smp 16 -q bigmem
sge_option_fc = -pe smp 24 -q bigmem
sge_option_cns = -pe smp 8 -q bigmem
pa_concurrent_jobs = 192
cns_concurrent_jobs = 192
ovlp_concurrent_jobs = 192
#Here is a set parameters allowing faster computation but less sensitive for read overlaps
pa_HPCdaligner_option = -v -dal128 -e0.75 -M24 -l2500 -k18 -h1250 -w8 -s100
ovlp_HPCdaligner_option = -v -dal128 -M24 -k24 -h1250 -e.96 -l1500 -s100
pa_DBsplit_option = -a -x500 -s200
ovlp_DBsplit_option = -s200
falcon_sense_option = --output_multi --output_dformat --min_idt 0.70 --min_cov 4 --max_n_read 400 --n_core 8
falcon_sense_skip_contained = False
overlap_filtering_setting = --max_diff 120 --max_cov 120 --min_cov 4 --n_core 12
thanks a lot for your quick reply! It helps.
Jason, FALCON now supports auto-calculation of length_cutoff, like this:
length_cutoff = -1
seed_coverage = 20
genome_size = 1000000000
@pb-cdunn thanks for point it out. I think @yilunhuangyue needs to use the master branch for that.
hello,I have install falcon, and i have tried the example from https://github.com/PacificBiosciences/FALCON/wiki/Setup%3A-Running
the program ended without error message,but the log file is
And the 2-asm-falcon file contains files such as a_ctg_all.fa and p_ctg.fa, I am not sure if it is the assembly result.
besides,I want to assembly a plant genome, the genome size is about 400M. what parameters should be changed in the cfg file?
Thanks a lot for any help!