Open B10inform opened 3 years ago
Does your sample have sex chromosomes? What are the expected sizes of the two haplotypes?
It is a diploid genome, the expected size is around 360MB. Hifiasm version _0.15.4-r347 seems to give good output but the newer version does not.
So there are no sex chromosomes which may lead to different sizes of two haplotypes? The unbalanced two haplotypes are always caused by the mispositioned centromeres, which is very tricky. The changes are as following:
(1) version 0.16.0 introduces a new error correction method so that the contigs tend to be longer and resolve more repeats. (2) version 0.16.1 determines homologous pairs by both all-vs-all contig alignment and Hi-C weight, while previous versions only consider contig alignments.
Could you please have a look if v0.16.0 can give balance two haplotypes? I'd like to figure out which parts lead to this issue. Of course if you can share the bin files with us, it will be more helpful. Thank you in advance.
Hi chhylp123,
Both v0.16.0 and v0.16.1 gives me similar result. .bin file are too big, i cannot send it here, can you give me your email ? Best
It's hcheng@jimmy.harvard.edu. Thank you so much.
Hi,
Hope you got the bin files.
H chhylp123,
Were you able to look into the bin files. I had send it through we-transfer.
Best
Sorry, I missed it. Could please send me *.ec.bin
, *.ovlp.reverse.bin
, *.ovlp.source.bin
, *.hic.lk.bin
and *.hic.tlb.bin
? I just got one bin file so that I cannot reproduce the results.
I have send the .bin files. Hope you got it.
Thanks a lot. I will reach out to you soon.
Hi chhylp123, Any updates.
Sorry for the late reply. I have checked the results but it is tricky to say which one is right. I'm thinking to debug in two ways: 1) could you please check the Hi-C heatmap like this: https://github.com/baozg/phased-assembly-check? As your genome is not too large, probably it won't take too much time. 2) Another way is to have a look at the k-mer plot using KAT or merqury. These tools can tell you if there are 2-copy regions, and where are them. 2-copy regions should be the redundances that should be fixed.
1) could you please check the Hi-C heatmap I need assemble contigs and HiC reads for Hi-C map.
Hap2 vs raw fastq
Merqury plot
@B10inform Sorry for the late reply. The k-mer plot looks not too bad. I wonder can you share the bin file with me again? Probably I can run purge_dups on top of each haplotype and find potential duplicated regions.
Hi chhylp123,
Which bin files do you want me to send? there are .lk.bi, .tlb.bin, reverse.bin, source.bin ec.bin or all of them?
Could you please share all bin files? Sorry I just deleted them on my side.
Hi chhylp123, I have sent them through wetransfer.
What do you think about the merqury Hapmer dbs for trios using reads sequences extracted from the raw HiFi data (original .fastq files) with the Hap1 (HG:A:p) and Hap2 (HG:A:m) information from GFA files.
Thanks
Thanks a lot. For merqury plot, it seems you are using the phasing results of hifiasm to evaluate the phased assemblies of hifiasm. So probably it makes little sense.
Since i don't have the parental reads, what would be the best reads to use?
Thanks
I just have no idea if it makes sense in practice...
Hi @B10inform, I was wondering if you could also share the bin files of v0.16.1 with me? It seems the wired assemblies were generated by v0.16.1. Thank you in advance.
Hi chhlyp123,
Were you able to run purge_dups on top of each haplotype and find potential duplicated regions?
I have sent the v0.16.1 bin files.
@B10inform, may I ask how do I decompress V0.16.1.asm.ec.bin*
? I merged them together by cat
and them decompressed the merged file. However, I got a warning extra bytes at beginning or within zipfile
.
Did you try zcat?
Hi chhlyp123,
Were you able to run purge_dups on the haplotypes (Hifiasm version _0.15.4-r347) and to look at potential duplicated regions?
Thanks
Sure. I will try it this weekend.
Could you share the software, protocol etc. to look at potential duplicated regions, if it is ok?
Thanks
Hi chhylp123, These are the plots i see with purge dups, they look weird? What do you think about these plot?
It looks ok. What I will do is to find all potential overlaps between contigs, and then check these overlaps one-by-one to see if some of them are false duplications.
So it is pretty tricky...
Hi chhylp123,
Any updates, thank you.
Sorry there are too many things... I will reach out to you Thursday.
@B10inform Sorry for the late... I guess I find something. Let me get together the results.
Hi chhylp123, I was wondering if you were able to find what it was.
Thanks
Sorry for the late reply. For 0.15.4, it is ok. As for 0.16.1, hifiasm mispositioned two homologous contigs to hap2, so that hap2 is larger. What I'm doing for debugging is to find all-vs-all overlaps in hap2 assembly by minimap2. In this case, you could find a long overlap between two contigs (one of these two contigs should be reassigned to hap1 assembly). It is not easy to directly fix it on my side since I don't have Hi-C. If you could follow the poster here (https://github.com/baozg/phased-assembly-check), I do think it is easy to be fixed.
Hi, There is huge difference with the haplotype 1 and 2 output with version.
Hifiasm version _0.15.4-r347 : hifiasm -o xx.asm --primary--n-perturb 20000 --f-perturb 0.15 --seed 11 -l3 --n-weight 6 -s 0.55 -k 60 --h1 .fastq.gz --h2 .fastq.gz .fastq
HAP1:362911 HAP2: 365427**
Hifiasm V_0.16.1-r375: hifiasm -o xx.asm --primary--n-perturb 20000 --f-perturb 0.15 --seed 11 -l3 --n-weight 6 -s 0.55 -k 60 --h1 .fastq.gz --h2 .fastq.gz .fastq HAP1:344083 HAP2: 382213
What could be the reason? Could this be looked into.
Thanks