Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
369 stars 53 forks source link

compare v2.4 with v2.2-beta.0 #100

Open HanKMU opened 3 years ago

HanKMU commented 3 years ago

Dear authors, I had tried to use only about 30X of nanopore data to assemble a 1G genome with v2.2-beta.0 and get N50 140536. However, when I update to v2.4 and rerun with the same dataset, I only get N50 89931. Any suggestion about this? Thank you!

Here is the run.cfg of v2.2-beta.0.

[General]
job_type = local
job_prefix = EEG_nextDenovo
task = all 
rewrite = yes 
deltmp = yes
rerun = 3
parallel_jobs = 20
input_type = raw
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
seed_cutoff = 1193
blocksize = 2g
pa_correction = 20
seed_cutfiles = 20
sort_options = -m 20g -t 10 -k 30
minimap2_options_raw = -x ava-ont -t 8  --minlen 1000
correction_options = -b

[assemble_option]
random_round = 20
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1

Here is the run.cfg of v2.4.

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes 
parallel_jobs = 20 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = ont # clr, ont, hifi
input_fofn = ./20210112input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
genome_size = 1G # estimated genome size
seed_depth = 31
sort_options = -m 20g -t 25 -k 30
minimap2_options_raw = -t 25 --minlen 1000
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -b

[assemble_option]
minimap2_options_cns = -t 20 -k17 -w17
minimap2_options_map = -t 20
nextgraph_options = -a 1
moold commented 3 years ago

Are the seed_cutoff values of different versions the same?

moold commented 3 years ago

The new version has updated some default parameters and algorithms.

HanKMU commented 3 years ago

Are the seed_cutoff values of different versions the same?

No.. In v2.2-beta.0, the seed_cutoff was calculated by seq_stat which was 1193. In v2.4, according to the log.info, seed_cutoff was 3177. Is there any recommendation for improving the assembly by v2.4? The difference between two N50 is so big... Thank you very much.

moold commented 3 years ago

Maybe this is the reason. You can try to set different seed_cutoff values to check the assembly results, I have no special suggestions, because if there is a better parameter set, I will set them as default.

HanKMU commented 3 years ago

Thanks! Can I add seed_cutoff option manually to the run.cfg of v2.4? As the followed example?

[correct_option] read_cutoff = 1k genome_size = 1G # estimated genome size seed_cutoff = 1193 seed_depth = 31 sort_options = -m 20g -t 25 -k 30 minimap2_options_raw = -t 25 --minlen 1000 pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage. correction_options = -b

moold commented 3 years ago

yes