Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

The assembled genome is much smaller than predicted, pacbio reads #76

Closed Jin-Sun-OUC closed 3 years ago

Jin-Sun-OUC commented 3 years ago

Question or Expected behavior Hi Dr Hu, First of all, thank you so much for developing nextDenovo. it works well on my nanopore reads. Currently, I am assembling a genome with a predicted genome size of around 2.16Gb and heterozygosity over 2.0%. I sequenced around 55X of pacbio reads and run Nextdenovo with the following settings.

[General] job_type = local job_prefix = nextDenovo task = all # 'all', 'correct', 'assemble' rewrite = yes # yes/no deltmp = yes rerun = 3 parallel_jobs = 4 input_type = raw input_fofn = ./input.fofn workdir = ./01_rundir

[correct_option] read_cutoff = 1k seed_cutoff = 6779 blocksize = 1g pa_correction = 2 seed_cutfiles = 2 sort_options = -m 1g -t 2 -k 50 minimap2_options_raw = -x ava-pb -t 10 correction_options = -p 15

[assemble_option] minimap2_options_cns = -x ava-pb -t 10 -k17 -w17 nextgraph_options = -a 1

However, the assembled genome size is only 5.2Mb. May I ask do you have any idea of what is going on? Or do you want me to give you more details regarding this project. Thanks a lot!

Operating system Which operating system and version are you using? CentOS

GCC What version of GCC are you using? gcc version 8.3.0

Python What version of Python are you using? 3.7.6

NextDenovo What version of NextDenovo are you using? v2.3.0

Additional context (Optional) Add any other context about the problem here.

moold commented 3 years ago

The seed length is too short, you can see the default value of option -min_len_seed in nextcorrect.py is 10k, so most of corrected seeds will be filtered, there are also some other options required seeds length >= 10k, I will provide some options for shorts seed in the next release.

Jin-Sun-OUC commented 3 years ago

Thanks so much for the very prompt reply. However, that is what we can get for this sample... Anyway, I am looking forward to your next release. Best, J