Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

How to improve the N50 and reduce contigs numbers? #200

Open cj2jy opened 3 months ago

cj2jy commented 3 months ago

Hi, I finished an assembly and the result is:

Type Length (bp) Count (#) N10 22880485 3 N20 10335838 10 N30 6877938 22 N40 5222529 39 N50 3377214 63 N60 1919981 103 N70 927783 178 N80 440773 335 N90 218142 666

Min. 28326 - Max. 57348206 - Ave. 742827 - Total 992417917 1336

run.cfg:

[General] job_type = slurm submit = sbatch --cpus-per-task=20 --mem-per-cpu=4g -o {out} -e {err} {script} job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 1 parallel_jobs = 5 input_type = raw read_type = ont input_fofn = ./input.fofn workdir = ./02_rundir

[correct_option] read_cutoff = 2k genome_size = 850M seed_cutoff = 25000 pa_correction = 3 sort_options = -m 20g -t 18 minimap2_options_raw = -t 18 correction_options = -p 18

[assemble_option] random_round = 20 minimap2_options_cns = -t 18 -k 23 -w 10 nextgraph_options = -a 1 -q 10

What can I do to increase the N50 and reduce the total number of contigs? I want a better result for 3d-DNA. Looking forward to reply. Thank you.

moold commented 3 months ago

It's hard to say, if I had a better solution I would set it as the default value. How ever, I think you can try to optimize these parameters: seed_cutoff, -k -w -f in minimap2_options_raw and minimap2_options_cns. BTW, you should make sure you are using the latest version of NextDenovo. You also can sequencing more ultra-long ONT SUP reads. At the last, you can try some other assemblers.

cj2jy commented 3 months ago

Thank you, I will change those parameters and try again. But I don't know what the -f means and how to optimize it, do you have any suggestion?

moold commented 3 months ago

try -f 0.0001 or less

cj2jy commented 3 months ago

Thank you, I ran again and it is still running. Can I use my last assembly result nd.asm.fasta as input to run assemble again? Would that be a better result?

cj2jy commented 3 months ago

Thank you, I ran again and it is still running. Can I use my last assembly result nd.asm.fasta as input to run assemble again? Would that be a better result?

moold commented 3 months ago

No

DaniPaulo commented 2 months ago

Hi @cj2jy, I'm still trying to understand how to run NextDenovo using SLURM. Could you share your script.slurm.sh?

In the run.cfg you set submit = sbatch --cpus-per-task=20 --mem-per-cpu=4g, so that means you also set #SBATCH --cpus-per-task=20 and #SBATCH --mem-per-cpu=4g?