c-zhou / yahs

Yet another Hi-C scaffolding tool
MIT License
131 stars 19 forks source link

Does yahs collapse contigs? #37

Open badplantgeek opened 2 years ago

badplantgeek commented 2 years ago

Hi

I am working on assembling a genome using hifiasm, and I obtained a phased assembly (hap1). However, the size is bigger than I expected and it has a high degree of duplication per BUSCO. I am suspecting I might have some contamination because the k-mer plot looks odd, perhaps I have DNA from two individuals.

I was wondering if I could use yahs to collapse these contigs that potentially come from two individuals based on HiC contact data. I am just not sure this is the right tool for this purpose.

Any suggestions would be much appreciated!

c-zhou commented 2 years ago

Hello @badplantgeek,

Unfortunately, YaHS was not designed to deal with this. It expects a single haplotype assembly input and will not remove any sequences.

Have you tried some haplotypic duplication purging tools such as purge_dups and purge_haplotigs?

Best, Chenxi

ardy20 commented 2 years ago

Hello

Thanks for your tool.

I used yahs on hifiasm rice assembly and see that top 24 scaffolds (in size) are bigger than 10Mb (chromosome level).

Does this mean that in an assembly of rice (2n=2x=24) with hifiasm (collapsed-main assembly) +Hic, we need to consider/count 24 top pseudomolecules as chromosome level assembly or only top 12?

In salsa2 and pin-hic I only chose top 12 (in size) as the rest are smaller than 10 Mb.

So, can we presume that salsa2 and pin-hic collapses two homologous chromosomes but yahs not?

c-zhou commented 1 year ago

Hello ardy20,

I cannot really comment on this unless more assembly information is provided such as how you ran hifiasm, what the assembly size, N50, and N90 are etc. What is the total size of the top 12 scaffolds generated by salsa2 or pin-hic? Is the size close to the expected genome size?

Does this mean that in an assembly of rice (2n=2x=24) with hifiasm (collapsed-main assembly) +Hic, we need to consider/count 24 top pseudomolecules as chromosome level assembly or only top 12?

Sorry, I do not think I understand this question.

salsa2 and pin-hic collapses two homologous chromosomes but yahs not?

This is very unlikely, all these three tools are not haplotype-aware.

Best, Chenxi