Open LeoCao-X opened 3 years ago
sgs_options = -max_depth 100 -bwa
for NextPolish?contigs (>= 0 bp) 19621
contigs (>= 1000 bp) 19621
contigs (>= 5000 bp) 19406
contigs (>= 10000 bp) 13914
contigs (>= 25000 bp) 8058
contigs (>= 50000 bp) 3787
Total length (>= 0 bp) 841167373 Total length (>= 1000 bp) 841167373 Total length (>= 5000 bp) 840213028 Total length (>= 10000 bp) 801189177 Total length (>= 25000 bp) 706046327 Total length (>= 50000 bp) 554939331
contigs 19621
Largest contig 6804176
Total length 841167373 GC (%) 34.41
N50 88966
N90 16854
L50 1721
L90 10524N's per 100 kbp 0.00
I used NextPolish to refine the assembly and run BUSCO for polished assembly.
BUSCO version is: 4.1.3
The lineage dataset is: insecta_odb10 (Creation date: 2019-11-20, number of species: 75, number of BUSCOs: 1367)
Results:
C:73.2%[S:73.2%,D:0.0%],F:12.4%,M:14.4%,n:1367
1001 Complete BUSCOs (C)
1001 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
169 Fragmented BUSCOs (F)
197 Missing BUSCOs (M)
1367 Total BUSCO groups searched
Results:
C:81.1%[S:81.1%,D:0.0%],F:10.3%,M:8.6%,n:1367
1108 Complete BUSCOs (C)
1108 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
141 Fragmented BUSCOs (F)
118 Missing BUSCOs (M)
1367 Total BUSCO groups searched
If you are ensure your genome size is about 800M, try the following nextgraph_options:
-a 0 -A
-a 0 -n 45
-a 0 -I 0.5
-a 0 -q 5
-a 0 -N 1
-a 0 -u 1
-a 0 -k
-a 0 -I 0.1
-a 0 -G
You can cd
to directory 03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
and rerun nextgraph
manually, it should be very fast for each version.
After that, you can choose the best options and set nextgraph_options
in the config file and rerun the main task nextDenovo run.cfg
, nextDenovo will backup your first assembly result and only rerun the assembly step.
I am not sure with the genome size. I use kmergenie with ngs short reads and it suggests that besk k is 111 and genome size is 730M.
Question or Expected behavior Hello, Ph.D. Hu, thanks for your development of such a powerful genome assembly software to help us assembly efficiently. I have tried NextDenovo for my organism genome but my results are not as perfect as I hope. Could you give me some suggestions? I used
seq_stat
to evaluate seed_cutoff before using NextDenovo. My genome is about 700M and I filtered 1k reads, my rawdata.fasta.gz is about 47G, expected corrected depth 45X.seq_stat
result is 26084. The parameters I used for NextDenovo areOperating system 4.14.65-gentoo
GCC gcc version 7.3.0 (Gentoo 7.3.0-r3 p1.4)
Python What version of Python are you using? Python 3.8.3
NextDenovo Nextdenovo v2.3.1
Assembly results are
After that, I used NextPolish to refine assembly results with NGS short reads and Nanopre long reads. The parameters are
Polish results are
Finally, I used BUSCO to evaluate polish results.
Obviously, BUSCO score is really low. Could you give some suggestions for NextDenovo or NextPolish using? My computing resource is about 300 G and 80 cores. Are there any more details should I supply ?
Thanks so much!