Open liu-xingliang opened 1 year ago
Yes, it is too high and shouldn't happen. When you ran hifiasm with trio-binning, were you using the following command lines?
hifiasm -o HG002.30X.asm/HG002.30X.asm -t 32 -1 pat.yak -2 mat.yak
Looks like there is no HiFi reads specified for hifiasm?
Thank you @chhylp123
It's a two commands combo following instruction here to get "both primary/alternate assemblies and trio binning assemblies":
My scripts with comments:
hifiasm -o HG002.30X.asm/HG002.30X.asm -t 32 HG002-rep1.30X.fastq # only use HiFi reads to get primary/alternative assemblies
hifiasm -o HG002.30X.asm/HG002.30X.asm -t 32 -1 pat.yak -2 mat.yak # trio binning assemblies
Could you please have a try with:
hifiasm -o HG002.30X.asm/HG002.30X.asm -t 32 -1 pat.yak -2 mat.yak HG002-rep1.30X.fastq
We always need to give a file to hifiasm to make it think there is a HiFi read file.
@chhylp123 , thank you for comments. I tried to use one-line hifiasm trio assembly below, but switch error rate is same, and there is no *.bp.*.gfa
output comparing to previous run:
coverage="full" # 30X
hifiasm=~/softwares/hifiasm/hifiasm
mkdir HG002.$coverage.asm_hifi
$hifiasm -o HG002.$coverage.asm_hifi/HG002.$coverage.asm -t 32 -1 pat.yak -2 mat.yak HG002-rep1.$coverage.fastq /dev/null 2> HG002.$coverage.asm_hifi.trio.log
awk '/^S/{print ">"$2;print $3}' HG002.full.asm_hifi/HG002.full.asm.dip.hap1.p_ctg.gfa > HG002.full.asm_hifi/HG002.full.asm.dip.hap1.p_ctg.gfa.fa
~/softwares/yak/yak trioeval pat.yak mat.yak HG002.full.asm_hifi/HG002.full.asm.dip.hap1.p_ctg.gfa.fa -t 12 > HG002.full.asm_hifi/HG002.full.asm.dip.hap1.p_ctg.trioeval.out
awk '/^S/{print ">"$2;print $3}' HG002.full.asm_hifi/HG002.full.asm.dip.hap2.p_ctg.gfa > HG002.full.asm_hifi/HG002.full.asm.dip.hap2.p_ctg.gfa.fa
~/softwares/yak/yak trioeval pat.yak mat.yak HG002.full.asm_hifi/HG002.full.asm.dip.hap2.p_ctg.gfa.fa -t 12 > HG002.full.asm_hifi/HG002.full.asm.dip.hap2.p_ctg.trioeval.out
tail -n 3 HG002.full.asm_hifi/HG002.full.asm.dip.hap1.p_ctg.trioeval.out
# W 86232 3612758 0.023869
# H 81960 3613135 0.022684
# N 3530950 82206 0.022752
tail -n 3 HG002.full.asm_hifi/HG002.full.asm.dip.hap2.p_ctg.trioeval.out
# W 61826 1904199 0.032468
# H 48191 1904521 0.025303
# N 48593 1855931 0.025515
# previous two-commands run
# hifiasm -o HG002.30X.asm/HG002.30X.asm -t 32 HG002-rep1.30X.fastq
# hifiasm -o HG002.30X.asm/HG002.30X.asm -t 32 -1 pat.yak -2 mat.yak
ls ~/hifiasm_trio/HG002.30X.asm
# 24G Jul 22 08:19 HG002.full.asm.ec.bin
# 12G Jul 22 08:31 HG002.full.asm.ovlp.source.bin
# 9.2G Jul 22 08:37 HG002.full.asm.ovlp.reverse.bin
# 6.5G Jul 22 08:47 HG002.full.asm.bp.r_utg.gfa
# 129M Jul 22 08:47 HG002.full.asm.bp.r_utg.noseq.gfa
# 11M Jul 22 08:49 HG002.full.asm.bp.r_utg.lowQ.bed
# 6.4G Jul 22 08:50 HG002.full.asm.bp.p_utg.gfa
# 127M Jul 22 08:50 HG002.full.asm.bp.p_utg.noseq.gfa
# 9.9M Jul 22 08:52 HG002.full.asm.bp.p_utg.lowQ.bed
# 3.0G Jul 22 08:55 HG002.full.asm.bp.p_ctg.gfa
# 67M Jul 22 08:55 HG002.full.asm.bp.p_ctg.noseq.gfa
# 2.2M Jul 22 08:56 HG002.full.asm.bp.p_ctg.lowQ.bed
# 3.0G Jul 22 10:25 HG002.full.asm.bp.hap1.p_ctg.gfa
# 65M Jul 22 10:25 HG002.full.asm.bp.hap1.p_ctg.noseq.gfa
# 2.7M Jul 22 10:25 HG002.full.asm.bp.hap1.p_ctg.lowQ.bed
# 2.8G Jul 22 10:26 HG002.full.asm.bp.hap2.p_ctg.gfa
# 61M Jul 22 10:26 HG002.full.asm.bp.hap2.p_ctg.noseq.gfa
# 2.2M Jul 22 10:27 HG002.full.asm.bp.hap2.p_ctg.lowQ.bed
# 74M Jul 28 13:00 HG002.full.asm.hap1.phase.bin
# 80M Jul 28 13:00 HG002.full.asm.hap2.phase.bin
# 6.5G Jul 28 13:11 HG002.full.asm.dip.r_utg.gfa
# 128M Jul 28 13:11 HG002.full.asm.dip.r_utg.noseq.gfa
# 11M Jul 28 13:13 HG002.full.asm.dip.r_utg.lowQ.bed
# 6.4G Jul 28 13:14 HG002.full.asm.dip.p_utg.gfa
# 127M Jul 28 13:14 HG002.full.asm.dip.p_utg.noseq.gfa
# 9.9M Jul 28 13:16 HG002.full.asm.dip.p_utg.lowQ.bed
# 2.8G Jul 28 19:00 HG002.full.asm.dip.hap1.p_ctg.gfa
# 62M Jul 28 19:00 HG002.full.asm.dip.hap1.p_ctg.noseq.gfa
# 2.6M Jul 28 19:01 HG002.full.asm.dip.hap1.p_ctg.lowQ.bed
# 2.9G Jul 28 19:02 HG002.full.asm.dip.hap2.p_ctg.gfa
# 64M Jul 28 19:02 HG002.full.asm.dip.hap2.p_ctg.noseq.gfa
# 2.4M Jul 28 19:02 HG002.full.asm.dip.hap2.p_ctg.lowQ.bed
# one-command run:
# hifiasm -o HG002.30X.asm_hifi/HG002.30X.asm -t 32 -1 pat.yak -2 mat.yak HG002-rep1.30X.fastq
ls ~/hifiasm_trio/HG002.30X.asm_hifi
# 56K Oct 26 06:24 ..
# 24G Oct 26 23:25 HG002.full.asm.ec.bin
# 74M Oct 26 23:25 HG002.full.asm.hap1.phase.bin
# 80M Oct 26 23:25 HG002.full.asm.hap2.phase.bin
# 12G Oct 26 23:27 HG002.full.asm.ovlp.source.bin
# 9.2G Oct 26 23:28 HG002.full.asm.ovlp.reverse.bin
# 6.5G Oct 27 13:37 HG002.full.asm.dip.r_utg.gfa
# 128M Oct 27 13:37 HG002.full.asm.dip.r_utg.noseq.gfa
# 11M Oct 27 13:38 HG002.full.asm.dip.r_utg.lowQ.bed
# 6.4G Oct 27 13:39 HG002.full.asm.dip.p_utg.gfa
# 127M Oct 27 13:39 HG002.full.asm.dip.p_utg.noseq.gfa
# 9.9M Oct 27 13:40 HG002.full.asm.dip.p_utg.lowQ.bed
# 2.8G Oct 27 16:35 HG002.full.asm.dip.hap1.p_ctg.gfa
# 62M Oct 27 16:35 HG002.full.asm.dip.hap1.p_ctg.noseq.gfa
# 2.6M Oct 27 16:35 HG002.full.asm.dip.hap1.p_ctg.lowQ.bed
# 2.9G Oct 27 16:37 HG002.full.asm.dip.hap2.p_ctg.gfa
# 64M Oct 27 16:37 HG002.full.asm.dip.hap2.p_ctg.noseq.gfa
# 2.4M Oct 27 16:37 HG002.full.asm.dip.hap2.p_ctg.lowQ.bed
# 2.8G Oct 27 18:04 HG002.full.asm.dip.hap1.p_ctg.gfa.fa
# 3.0M Oct 27 18:10 HG002.full.asm.dip.hap1.p_ctg.trioeval.out
# 2.9G Oct 27 18:11 HG002.full.asm.dip.hap2.p_ctg.gfa.fa
Thank you.
@liu-xingliang Could you please show me the Bandage screenshot for the assembly graph? Could you please evaluate the hamming error rate of p_utg.gfa
? If everything is right, the hamming/switch error rate of p_utg.gfa
should be very low. In theory, HG002 is a standard sample which shouldn’t have issues. If possible, could you please share the bin files and the yak indexes with us? I will check the details on my side. Thanks so much!
I applied hifiasm (v0.19.5-r592) to a HG002 30X dataset sequenced using PacBio Revio and got 2-3% switching error rate for primary contigs and two phased haplotypes contigs:
My question is: are above switching error rates considered too high. Should they all be below 1% to be safe.
Thank you very much for your help!
Details of the trio-assembly: