Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

[4501 ERROR] 2023-06-13 22:29:40 the input data is insufficient for an assembly. #183

Closed Evansd36 closed 1 year ago

Evansd36 commented 1 year ago

Describe the bug As stated by the error message NextDenovo does not like my input data. The input data I am using are contigs given to me by a collaborator. They are from a PacBio CLR run but have already been partially assembled. They are NOT the raw Pacbio reads and instead the fasta file comprises ~3,500 contigs ranging from 40kb-900kb. Because of this, it seems NextDenovo is reading the seed depth as extremely low (1.13) and stopping the assembly. I know that the data in the fasta file comprises the whole genome and should be pretty high quality (as some polishing/contig joining has already been done) but I am curious if because of this I cannot use NextDenovo. If this is the case do you have another assembly program you would suggest? I am also trying minimap2 on its own but I am still working through what filters I need. Runlog, config file, and a seq_stat run are all attached. If NextDenovo can handle extremely large contigs treated as "reads" is there some parameter I am missing that would help? Thanks!!

Error message (changed all files to .txt files so I could attach them) seq_stat_g.txt pid4501.log.txt NextDonovo_config_file.txt

Genome characteristics plant genome, 350Mb, Pretty repetitive, Heterozygous individual sequenced

GCC gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)

Python Python 3.5.5

NextDenovo nextDenovo 2.5.2

To Reproduce (Optional) Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!

Additional context (Optional) I am running this on a pretty old server that hasn't been updated in a while. It is totally possible that running an older version of Python or GCC could be causing issues as well.

moold commented 1 year ago

Try to set input_type = corrected, but I am not sure it will work, and just hava a try.

Evansd36 commented 1 year ago

Same error, unfortunately, any other ideas? pid5326.log.txt

moold commented 1 year ago

Try to set genome size = 1M and if it still report an error, then i can't help it.