chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
526 stars 86 forks source link

The assembled genome is twice as large as expected in HIC mode #114

Open jinxin112233 opened 3 years ago

jinxin112233 commented 3 years ago

Hi

We assemble the genome in hic mode and run the command as hifiasm -o NA12878.asm -t 40 --h1 read1.fq.gz --h2 read2.fq.gz HiFi-reads.fq.gz

Unfortunately, the size of NA12878.asm.hic.hap1.p_ctg.gfa and NA12878.asm.hic.hap2.p_ctg.gfa are both 1.6G. While the predicted result of the genome by flow cytometry is about 800M. Do we need to purge?or another suggestion?

Thanks JX

chhylp123 commented 3 years ago

What's the size of p_utg.gfa? And what are the following two numbers?

[M::purge_dups] purge duplication coverage threshold:
[M::stat] # heterozygous bases: 6179497799; # homozygous bases: 480824972
jinxin112233 commented 3 years ago

Hi Sorry for our slow response. We re-run hifiasm to get the running log. Here is the two numbers [M::stat] # heterozygous bases: 277085802; # homozygous bases: 1528254506

The size of NA12878.asm.hic.p_ctg.gfa is also 1.6G Here is all the file size 17G NA12878.asm.ec.bin 61M NA12878.asm.hic.clean_d_utg.noseq.gfa 1.6G NA12878.asm.hic.hap1.p_ctg.gfa 1.3M NA12878.asm.hic.hap1.p_ctg.lowQ.bed 59M NA12878.asm.hic.hap1.p_ctg.noseq.gfa 1.6G NA12878.asm.hic.hap2.p_ctg.gfa 1.3M NA12878.asm.hic.hap2.p_ctg.lowQ.bed 58M NA12878.asm.hic.hap2.p_ctg.noseq.gfa 4.9G NA12878.asm.hic.lk.bin 1.6G NA12878.asm.hic.p_ctg.gfa 1.4M NA12878.asm.hic.p_ctg.lowQ.bed 59M NA12878.asm.hic.p_ctg.noseq.gfa 1.8G NA12878.asm.hic.p_utg.gfa 2.8M NA12878.asm.hic.p_utg.lowQ.bed 61M NA12878.asm.hic.p_utg.noseq.gfa 1.8G NA12878.asm.hic.r_utg.gfa 2.9M NA12878.asm.hic.r_utg.lowQ.bed 61M NA12878.asm.hic.r_utg.noseq.gfa 22G NA12878.asm.hic.tlb.bin 4.3G NA12878.asm.ovlp.reverse.bin 16G NA12878.asm.ovlp.source.bin

Thank you for your help JX

chhylp123 commented 3 years ago

Hifiasm misidentified hom peak so that it thought most regions are hom. Could you please reset hom peak by "--purge-cov"? It should be a little bit larger than hom peak.

chhylp123 commented 3 years ago

Please note that you need to update hic bin files with new hom peak.

jinxin112233 commented 3 years ago

HI The running command is it like this ? hifiasm -o NA12878.asm -t40 --purge-cov 1628254506 --h1 read1.fq.gz --h2 read2.fq.gz HiFi-reads.fq.gz

And maybe I know less about hifiasm. I don’t really understand which hic bin files needs to be updated ?where is the file

Thank you for your help JX

chhylp123 commented 3 years ago

What's the coverage of the dataset? Could you please show the k-mer histogram?

jinxin112233 commented 3 years ago

Hi The coverage of the dataset is about ~40X. Here is the k-mer histogram generated by using genomescope 图片1

best JX

chhylp123 commented 3 years ago

Probably you can have a try with: hifiasm -o NA12878.asm -t40 --purge-cov 50 --h1 read1.fq.gz --h2 read2.fq.gz HiFi-reads.fq.gz

And please delete *hic*bin before rerunning hifiasm.

jinxin112233 commented 3 years ago

Great! let me try it ~

best wish JX