Open zhangrengang opened 12 months ago
I have another issue with --n-hap 4
. It in fact output 8 haplotypes in size (2.8 Gb in total). While using default --n-hap 2
, it output 1.5 Gb which is expected for our autotretaploid genome. However, the 1.5 Gb have missed some large regions (homoeologous collaspe), as confirmed by aligning with the reference and analyzing the coverage depth.
The same h1tg
for all haplotypes is a known bug when we use hifiasm
for tetraploid potato. But I never saw hifiasm
will output 8 haplotypes when you use --n-hap 4
. Do you have all the logs for this run? HiC-based phasing for polyploidy is still very unstable as I know, it depends on the heterozygous variants distribution of autotetraploid
Yes, I agree with @baozg. Do you have the log file for hifiasm?
Hi, I ran into the same problem, my genome is a triploid, kmer predicts the genome size to be around 700M for a single haplotye, and whole genome size should be 2~2.1G, when I use version 0.19.5-r587 with the parameter "--n-hap 3 --h1 hic_R1.fastq --h2 hic_R2. fastq" , the result is hifi.hic.hap1.p_ctg.gfa.fa,1.5G; hifi.hic.hap2.p_ctg.gfa.fa,1008M; hifi.hic.hap3.p_ctg.gfa.fa,825M; hifi.hic.p_ctg.gfa.fa; and hifi.hic.p_ctg.gfa.fa. ctg.gfa.fa,1.5G; hifi.hic.p_utg.gfa.fa,2.3G; homozygous read coverage threshold: 33. Then when I add "--hom-cov 17", the result is hifi .hic.hap1.p_ctg.gfa.fa,2.0G; hifi.hic.hap2.p_ctg.gfa.fa,2.0G; hifi.hic.hap3.p_ctg.gfa.fa,2.0G; hifi.hic.pctg.gfa.fa,2.1G; hifi.hic.p utg.gfa.fa,2.3G. According to the size of each hap, it looks like that each hap contains all 3 sets of sequences. Is it possible that I am using the parameters incorrectly?
Also, when I use version 0.16.1-r375 with parameter "--n-hap 3 --h1 hic_R1.fastq --h2 hic_R2.fastq" , the result is hifi_hic.hic.hap1.p_ctg.fa,657M hifi_hic.hic.hap2.p_ctg.fa,1.5G; hifi_hic.hic.p_ctg.gfa.fa,1.5G; hifi_hic.hic.p_utg.fa,2.2G; hifi_hic.hic.r_utg.gfa.fa,2.2G; and its hap1 and hap2 sizes are consistent with the state of my AAB triploid genome. When I use p_utg for 3ddna, the sequence is too fragmented and there are collapsed regions. So I combined hap1 and hap2, and then run with 3ddna. It seems to work well from the results, I wonder if my way of combining hap1 and hap2 to go to mount is appropriate?
HiC phased triploid assembly is still tricky. If --n-hap 3
doesn't work well, could you please have a try with the normal diploid assembly, and then take 3d-dna to mannually fix the duplications?
Much thanks, I think there may also be a problem with my understanding of the “hom cov”, when I change the parameter to "--n-hap 3 --hom-cov 51", the total size is as expected but there are indeed duplicates, which occasionally occurs when I am using the diploid mode of 0.16.1-r375, utilizing "hap1+hap2 " mounted, and I wonder about the possible reasons for this occurrence?
Overall, i think there are four options now: which one do you recommend more?
When using
--n-hap 4
, all the fourhifiasm.hic.hap*.p_ctg.gfa
have the same sequence ID like "h1tg000001l".