Open jinxin112233 opened 3 years ago
What's the size of p_utg.gfa? And what are the following two numbers?
[M::purge_dups] purge duplication coverage threshold:
[M::stat] # heterozygous bases: 6179497799; # homozygous bases: 480824972
Hi
Sorry for our slow response. We re-run hifiasm to get the running log.
Here is the two numbers
[M::stat] # heterozygous bases: 277085802; # homozygous bases: 1528254506
The size of NA12878.asm.hic.p_ctg.gfa
is also 1.6G
Here is all the file size
17G NA12878.asm.ec.bin
61M NA12878.asm.hic.clean_d_utg.noseq.gfa
1.6G NA12878.asm.hic.hap1.p_ctg.gfa
1.3M NA12878.asm.hic.hap1.p_ctg.lowQ.bed
59M NA12878.asm.hic.hap1.p_ctg.noseq.gfa
1.6G NA12878.asm.hic.hap2.p_ctg.gfa
1.3M NA12878.asm.hic.hap2.p_ctg.lowQ.bed
58M NA12878.asm.hic.hap2.p_ctg.noseq.gfa
4.9G NA12878.asm.hic.lk.bin
1.6G NA12878.asm.hic.p_ctg.gfa
1.4M NA12878.asm.hic.p_ctg.lowQ.bed
59M NA12878.asm.hic.p_ctg.noseq.gfa
1.8G NA12878.asm.hic.p_utg.gfa
2.8M NA12878.asm.hic.p_utg.lowQ.bed
61M NA12878.asm.hic.p_utg.noseq.gfa
1.8G NA12878.asm.hic.r_utg.gfa
2.9M NA12878.asm.hic.r_utg.lowQ.bed
61M NA12878.asm.hic.r_utg.noseq.gfa
22G NA12878.asm.hic.tlb.bin
4.3G NA12878.asm.ovlp.reverse.bin
16G NA12878.asm.ovlp.source.bin
Thank you for your help JX
Hifiasm misidentified hom peak so that it thought most regions are hom. Could you please reset hom peak by "--purge-cov"? It should be a little bit larger than hom peak.
Please note that you need to update hic bin files with new hom peak.
HI
The running command is it like this ?
hifiasm -o NA12878.asm -t40 --purge-cov 1628254506 --h1 read1.fq.gz --h2 read2.fq.gz HiFi-reads.fq.gz
And maybe I know less about hifiasm. I don’t really understand which hic bin files needs to be updated ?where is the file
Thank you for your help JX
What's the coverage of the dataset? Could you please show the k-mer histogram?
Hi The coverage of the dataset is about ~40X. Here is the k-mer histogram generated by using genomescope
best JX
Probably you can have a try with:
hifiasm -o NA12878.asm -t40 --purge-cov 50 --h1 read1.fq.gz --h2 read2.fq.gz HiFi-reads.fq.gz
And please delete *hic*bin
before rerunning hifiasm.
Great! let me try it ~
best wish JX
Hi
We assemble the genome in hic mode and run the command as
hifiasm -o NA12878.asm -t 40 --h1 read1.fq.gz --h2 read2.fq.gz HiFi-reads.fq.gz
Unfortunately, the size of
NA12878.asm.hic.hap1.p_ctg.gfa
andNA12878.asm.hic.hap2.p_ctg.gfa
are both 1.6G. While the predicted result of the genome by flow cytometry is about 800M. Do we need to purge?or another suggestion?Thanks JX