I have been trying to run Falcon on a single machine but the config file is somewhat complicated to understand and after 8 days (200h) running, the process is still working on read error correction. For comparison, Canu takes less than 2 days to fully complete this assembly.
What would be the best config settings for assembling the Drosophila genome (~180Mb) in a 48-cores 500GB RAM machine?
[General]
job_type = local
# list of fasta files
input_fofn = input.fofn
# input type, raw or pre-assembled reads (preads, error corrected reads)
input_type = raw
#input_type = preads
# The length cutoff used for seed reads used for error correction.
# "-1" indicates FALCON should calculate the cutoff using
# the user-defined genome length and coverage cut off
# otherwise, user can specify length cut off in bp (e.g. 2000)
length_cutoff = -1
genome_size = 180000000
seed_coverage = 30
# The length cutoff used for overalpping the preassembled reads
length_cutoff_pr = 12000
## resource usage ##
# need this line even if running local
jobqueue = your_queue
# grid settings for...
# daligner step of raw reads
sge_option_da = -pe smp 5 -q %(jobqueue)s
# las-merging of raw reads
sge_option_la = -pe smp 20 -q %(jobqueue)s
# consensus calling for preads
sge_option_cns = -pe smp 12 -q %(jobqueue)s
# daligner on preads
sge_option_pda = -pe smp 6 -q %(jobqueue)s
# las-merging on preads
sge_option_pla = -pe smp 16 -q %(jobqueue)s
# final overlap/assembly
sge_option_fc = -pe smp 24 -q %(jobqueue)s
# job concurrency settings for...
# all jobs
default_concurrent_jobs = 30
# preassembly
pa_concurrent_jobs = 30
# consensus calling of preads
cns_concurrent_jobs = 30
# overlap detection
ovlp_concurrent_jobs = 30
# daligner parameter options for...
# https://dazzlerblog.wordpress.com/command-guides/daligner-command-reference-guide/
# initial overlap of raw reads
pa_HPCdaligner_option = -v -B4 -t16 -e.70 -l1000 -s1000
# overlap of preads
ovlp_HPCdaligner_option = -v -B4 -t32 -h60 -e.96 -l500 -s1000
# parameters for creation of dazzler database of...
# https://dazzlerblog.wordpress.com/command-guides/dazz_db-command-guide/
# raw reads
pa_DBsplit_option = -x500 -s50
# preads
ovlp_DBsplit_option = -x500 -s50
# settings for consensus calling for preads
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --max_n_read 200 --n_core 6
# setting for filtering of final overlap of preads
overlap_filtering_setting = --max_diff 100 --max_cov 100 --min_cov 20 --bestn 10 --n_core 24
Hey guys,
I have been trying to run Falcon on a single machine but the config file is somewhat complicated to understand and after 8 days (200h) running, the process is still working on read error correction. For comparison, Canu takes less than 2 days to fully complete this assembly.
What would be the best config settings for assembling the Drosophila genome (~180Mb) in a 48-cores 500GB RAM machine?