Open TypicalSEE opened 4 years ago
Hi, 1, No, although NextDenovo will try to correct reads with length longer than 1001bp, but it will filter some low quality, low depth reads and..., so the output corrected read is much less than the raw reads with length > 1001bp.
2 & 3, Your data is not enough for assembly using the currently version of NextDenovo with default options, because all default options are optimize with 60-100x NanoPore data. So it will produce an unexpected assembly result. But if you still want to use NextDenovo to do the assembly, you can try to use the option correction_options = -b and change -k 20 in sort_options and than rerun all pipeline, while I can not guarantee you can get a good result. You can try to other assemblers.
Thanks for your reply, it helps a lot. But what still confuses me is: should I set seed_cutoff as low as possible(1001, for example) when I have enough nanopore data and enough CPUs? Will correcting as many reads as possible improve assembly quality? Thanks again.
Yes, but I recommend using bin/seq_stat to calculate the expected seed cutoff.
Dear Dr. Hu,
I've recently run nextdenovo using 33x ONT reads from a 1Gb genome. After running seq_stats the suggested seed cutoff was 0 bp. However as the minimum read length was 1000bp I set the seed_cutoff to 1.1k. Results were a bit dissapointing with N50=2Mb. As a comparison for a mammalian genome with 70x Pacbio I've got an N50 of 76Mb!!!
I wonder if it worths tweaking some of the parameters as suggested above (correction_options = -b and change -k 20 in sort_options) or would be necessary to gather more data to reach 60x (that is not always possible)?
I also would like to know if there is some document with more detailed help on this assembler. do you have any sort of manual, white paper or do you plan to upload an MS to biorxiv?
The results on the mammal are very encouraging, the program is definitely a tool to consider for achieving chomosome-scale assemblies.
Thanks, Fernando
Hi, the input data is not enough, and the seed length is too short, you can see the default value of option -min_len_seed in nextcorrect.py is 10k, so most of corrected seeds will be filtered, currently, the default options are optimized for input data size >= 60x and seed length >=20Kb , Otherwise, it will produce some unexpected results and need be careful to check assembly quality. BTW I am now preparing the manuscript of NextDenovo, I also will provide some default options for short seeds and 30x input data in the next release. But, if you want to get a better assembly result, it is recommend to sequencing >=60X data using NanoPore ultra-long libraries.
Thanks,
Having more data would be always great. I would love to use ultralong reads, but as far as I know it will require a lot more input DNA.
I look forward to read the manuscript.
Best, Fernando
El sáb., 25 jul. 2020 11:05, Hu Jiang notifications@github.com escribió:
Hi, the input data is not enough, and the seed length is too short, you can see the default value of option -min_len_seed in nextcorrect.py is 10k, so most of corrected seeds will be filtered, currently, the default options are optimized for input data size >= 60x and seed length >=20Kb , Otherwise, it will produce some unexpected results and need be careful to check assembly quality. BTW I am now preparing the manuscript of NextDenovo, I also will provide some default options for short seeds and 30x input data in the next release. But, if you want to get a better assembly result, it is recommend to sequencing >=60X data using NanoPore ultra-long libraries.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Nextomics/NextDenovo/issues/49#issuecomment-663831791, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB34KVJ7LG7OHIJFQWLLE3LR5KN43ANCNFSM4KMFXSMQ .
Hi Hu,
I just read that latest release (version 2.3.1), use non-seed reads to correct structural & base errors if seed depth < 35 I guess those are the default options you mentioned above. Thus, should I expect also better results in cases with ONT coverage >=30x? Did you run tests on that front?
Thanks, Fernando
NextDenovo is only an assembly software, so if you need a more accuracy assembly, you can try to NextPolish
Hi, Ok, I see. The option is just affecting to base level accuracy (i.e. use non-seed reads to correct structural & base errors if seed depth < 35). I was thinking about getting better contiguity and assembly quality (fewer miss-assemblies) with less data. Thus, v2.3.1 still requires coverage >= 60x for optimal results, right? Thanks, Fernando
Hi, Dr. Hu, thanks for your excellent work at NextOmics. I have a few questions about the "seed_cutoff" option and I would appreciate it very much if you could help me: