chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
528 stars 86 forks source link

Assembly much shorter than expected #322

Open Zazzyre opened 1 year ago

Zazzyre commented 1 year ago

Hi! I'm trying to reduce the number of contigs in an assembly I have that was initially put together in SMRT analysis software but getting a much shorter assembly than expected from both related species and the initial assembly. I'm using the raw css reads as the imput to hifiasm.

We know the genome should be around 1.2Gbp from 4 other closely related birds and the initial assembly had 3791 contigs and a length of 1305233108 bases which agrees with that length.

From hifiasm I am consistently getting a length of around 3646273bp with 145 contigs.

My run command is currently:

hifiasm -o /hifiasm_out/moplhifi.asm -l0 -t 16 --hg-size 1.2g -D 10 /1803-24278.ccs.fasta.gz

What can I do to retain the length? It's not worth the contig reduction if we lose almost the whole genome.

chhylp123 commented 1 year ago

Could you please show the log file? Thanks a lot.

Zazzyre commented 1 year ago

Sure! Here's the log hifiasm.txt

chhylp123 commented 1 year ago

The k-mer plot is weird. A good HiFi dataset should have a k-mer plot like issue10 or issue49. In contrast, low quality HiFi data often lead to weird k-mer plot like issue93. For more detail, please see: https://hifiasm.readthedocs.io/en/latest/faq.html#why-does-hifiasm-stuck-or-crash. Could you please double check if the input HiFi reads have been processed by pbccs?