chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
547 stars 87 forks source link

the difference in haplotype size is very large #148

Open ld9866 opened 3 years ago

ld9866 commented 3 years ago

hello! l use the following code to assemble the genome. But the difference in haplotype size is very large, I would like to ask what causes this? hifiasm -o NA12878.asm -t32 --h1 read1.fq.gz --h2 read2.fq.gz HiFi-reads.fq.gz

-rw-rw-r-- 1 7.4G 6 30 15:21 NA12878.asm.ec.bin -rw-rw-r-- 1 1.2G 6 30 16:15 NA12878.asm.hic.hap1.p_ctg.gfa -rw-rw-r-- 1 820K 6 30 16:16 NA12878.asm.hic.hap1.p_ctg.lowQ.bed -rw-rw-r-- 1 46M 6 30 16:15 NA12878.asm.hic.hap1.p_ctg.noseq.gfa -rw-rw-r-- 1 524M 6 30 16:17 NA12878.asm.hic.hap2.p_ctg.gfa -rw-rw-r-- 1 492K 6 30 16:17 NA12878.asm.hic.hap2.p_ctg.lowQ.bed -rw-rw-r-- 1 22M 6 30 16:17 NA12878.asm.hic.hap2.p_ctg.noseq.gfa -rw-rw-r-- 1 3.9G 6 30 16:06 NA12878.asm.hic.lk.bin -rw-rw-r-- 1 1.7G 6 30 16:13 NA12878.asm.hic.r_utg.gfa -rw-rw-r-- 1 1.8M 6 30 16:15 NA12878.asm.hic.r_utg.lowQ.bed -rw-rw-r-- 1 63M 6 30 16:13 NA12878.asm.hic.r_utg.noseq.gfa -rw-rw-r-- 1 28G 6 30 15:31 NA12878.asm.hic.tlb.bin -rw-rw-r-- 1 1.3G 6 30 15:21 NA12878.asm.ovlp.reverse.bin -rw-rw-r-- 1 3.6G 6 30 15:21 NA12878.asm.ovlp.source.bin

chhylp123 commented 3 years ago

Is it human NA12878? We can get balanced two haplotypes on our side for NA12878.

ld9866 commented 3 years ago

It's not.  This is the open sequencing data of a potato, and I want to practice it before I use it.

lh3 commented 3 years ago

What is the ploidy?

ld9866 commented 3 years ago

Tetraploidy

chhylp123 commented 3 years ago

For now the phasing part of hifiasm only supports diploid samples. However we have checked some tetraploid samples like potato and looks like it is feasible to resolve them.

baozg commented 3 years ago

The sum size of your NA12878.asm.hic.hap1.p_ctg.gfa + NA12878.asm.hic.hap2.p_ctg.gfa are 1.7G total. Are you using the public diploid heterozgous potato from the Nature Genetics paper? @ld9866

ld9866 commented 3 years ago

Thank you very much for your reply. But I think that's probably because I downloaded the potato data, which is tetraploid, so there might be an error. Since I just want to practice at the moment, I will continue to practice normally.