chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
547 stars 87 forks source link

Tetraploid genome assembly #431

Open wyl1219 opened 1 year ago

wyl1219 commented 1 year ago

HelIo, I used hifiasm -o rz.asm -t 30 --n-hap 4 --ul ONT.fastq.gz hifi.fastq.gz to assemble a tetraploid genome, didn't Distinguish haplotypes, would you tell me there will be 2 or 4 p_ctg.gfa in the result files and give me some suggestions? thanks very much.

chhylp123 commented 1 year ago

If you have HiC reads, the latest release Hifiasm-0.19.3-r572 will give you 4 haplotypes. But the results might be not perfect right now.

wjq1981 commented 1 year ago

If you have HiC reads, the latest release Hifiasm-0.19.3-r572 will give you 4 haplotypes. But the results might be not perfect right now.

Hi, when I was testing 0.19.3 I had a question, but I don't know if it's right? When I assemble a tetraploid, I use the '-l 0' parameter and the result produced by p_ctg.gfa should be a diploid size, and the hap1+hap2 produced after adding the hic data is also a tetraploid size. But when I set '--n-hap 4', the size of the resulting 4 hap file d is almost the same size as the p_ctg.gfa file. Shouldn't it be half the size of the p_ctg.gfa file here?

chhylp123 commented 1 year ago

It is hard to say. Do you mean the p_ctg with -l0 has similar size to hap*p_ctg?

wjq1981 commented 1 year ago

It is hard to say. Do you mean the p_ctg with -l0 has similar size to hap*p_ctg?

Yes.

chhylp123 commented 1 year ago

Could you show the command lines?

wjq1981 commented 1 year ago

Could you show the command lines?

/hdd/hifiasm-0.19.3/hifiasm -o ttt -t 128 -f 38 --n-hap 4 -l 0 --h1 hic_R1.fastq.gz --h2 hic_R2.fastq.gz hifi.organelle.fasta.gz 2> ttt.log

chhylp123 commented 1 year ago

Please do not add -l0.

wjq1981 commented 1 year ago

From the current result it is not possible to use -l0, but the old version (0.16.1) is to use -l0.

chhylp123 commented 1 year ago

You should do not use -l0 and mannually set --hom-cov to the homozygous coverage. See: https://hifiasm.readthedocs.io/en/latest/faq.html#for-hi-c-integrated-assembly-why-the-assembly-size-of-both-haplotypes-are-much-larger-than-the-estimated-genome-size

wjq1981 commented 1 year ago

Ok, thank you. Wish you well!

XuDong919 commented 1 year ago

hi,haoyu.I have met the same problem, the genome size of haploidy is 1Gb, but the result of hifiasm is strange, the pctg genome size is 2.2Gb, the busco is C:99.5%[S:15.2%,D:84.3%],F:0.1%,M:0.4%,n:1614 , the hap1 genome size is 2.0GB , the busco is C:99.2%[S:16.7%,D:82.5%],F:0.1%,M:0.7%,n:1614 ; the hap2 genome size is 1.7Gb, the buso is C:99.5%[S:18.3%,D:81.2%],F:0.1%,M:0.4%,n:1614 ; the hap3 genome size is 1.9Gb, busco is C:99.0% [S:22.1%,D:76.9%],F:0.3%,M:0.7%,n:1614 ; the hap4 genome size is 1.7Gb, the busco is C:99.3%[S:21.9%,D:77.4%],F:0.2%,M:0.5%,n:1614 . this is my command: hifiasm.193 -o 21ct -t 60 --n-hap 4 --h1 HIC876_raw_1.fq.gz --h2 HIC876_raw_2.fq.gz 01m64144_220505_071151.reads.fq.gz 02m64173_220505_081857.reads.fq.gz 03m64180_220512_043802.reads.fq.gz 04m64180_220513_153507.reads.fq.gz. I have checked the log of hifiasm, the homozygous read coverage threshold is right. What should i do next ? Can I use a tool like purge_dups?

If you have HiC reads, the latest release Hifiasm-0.19.3-r572 will give you 4 haplotypes. But the results might be not perfect right now.

Hi, when I was testing 0.19.3 I had a question, but I don't know if it's right? When I assemble a tetraploid, I use the '-l 0' parameter and the result produced by p_ctg.gfa should be a diploid size, and the hap1+hap2 produced after adding the hic data is also a tetraploid size. But when I set '--n-hap 4', the size of the resulting 4 hap file d is almost the same size as the p_ctg.gfa file. Shouldn't it be half the size of the p_ctg.gfa file here?

wjq1981 commented 1 year ago

hi,haoyu.I have met the same problem, the genome size of haploidy is 1Gb, but the result of hifiasm is strange, the pctg genome size is 2.2Gb, the busco is C:99.5%[S:15.2%,D:84.3%],F:0.1%,M:0.4%,n:1614 , the hap1 genome size is 2.0GB , the busco is C:99.2%[S:16.7%,D:82.5%],F:0.1%,M:0.7%,n:1614 ; the hap2 genome size is 1.7Gb, the buso is C:99.5%[S:18.3%,D:81.2%],F:0.1%,M:0.4%,n:1614 ; the hap3 genome size is 1.9Gb, busco is C:99.0% [S:22.1%,D:76.9%],F:0.3%,M:0.7%,n:1614 ; the hap4 genome size is 1.7Gb, the busco is C:99.3%[S:21.9%,D:77.4%],F:0.2%,M:0.5%,n:1614 . this is my command: hifiasm.193 -o 21ct -t 60 --n-hap 4 --h1 HIC876_raw_1.fq.gz --h2 HIC876_raw_2.fq.gz 01m64144_220505_071151.reads.fq.gz 02m64173_220505_081857.reads.fq.gz 03m64180_220512_043802.reads.fq.gz 04m64180_220513_153507.reads.fq.gz. I have checked the log of hifiasm, the homozygous read coverage threshold is right. What should i do next ? Can I use a tool like purge_dups?

If you have HiC reads, the latest release Hifiasm-0.19.3-r572 will give you 4 haplotypes. But the results might be not perfect right now.

Hi, when I was testing 0.19.3 I had a question, but I don't know if it's right? When I assemble a tetraploid, I use the '-l 0' parameter and the result produced by p_ctg.gfa should be a diploid size, and the hap1+hap2 produced after adding the hic data is also a tetraploid size. But when I set '--n-hap 4', the size of the resulting 4 hap file d is almost the same size as the p_ctg.gfa file. Shouldn't it be half the size of the p_ctg.gfa file here?

From my personal experience. If all your data are HIFIs data, then you can use the default parameters of this version 0.16.1 to get hap1+hap2, if the size of hap is double what you expect, you add the parameter ''-l 0''. Finally mount hap1+hap2 with hic data and then split out 4 sets of haplotype genomes.

XuDong919 commented 1 year ago

Thank you very much for your reply. I have hifi and hic data, and I will try what you said. but i have a question, the way you spilt out of 4 sets of haplotype genomes is a tool like 3ddna, allhic?

wjq1981 commented 1 year ago

Thank you very much for your reply. I have hifi and hic data, and I will try what you said. but i have a question, the way you spilt out of 4 sets of haplotype genomes is a tool like 3ddna, allhic?

3ddna.

XuDong919 commented 1 year ago

ok,thanks!!

wyl1219 commented 1 year ago

Hello, I used Hifiasm-0.19.3-r572, command lines: hifiasm -o kl.asm -t 30 --n-hap 4 --hom-cov 88 --ul ONT.fastq.gz --h1 R1.clean.fastq.gz --h2 R2.clean.fastq.gz kl-hifi.fastq.gz, then I got 1.31G hap1_ctg.fa, 1.28G hap2_ctg.fa, 0.92G hap3_ctg.fa, 1.06G hap4_ctg.fa. busco show that: image

Then cat hap1 hap2 hap3 hap4 > hap1234, used juicer and 3D-DNA, the command lines are the same as before, but I only got 55.73M final.hic, the juicebox show that : image I don't know where is wrong , should I only to use the p_utg.fa files? thanks, whish you well.

chhylp123 commented 1 year ago

Probably you should open the issue in the 3D-DNA repo. It is not too bad based on my point of view.

wyl1219 commented 1 year ago

ok, thanks.

XuDong919 commented 1 year ago

hi,haoyu.I have met the same problem, the genome size of haploidy is 1Gb, but the result of hifiasm is strange, the pctg genome size is 2.2Gb, the busco is C:99.5%[S:15.2%,D:84.3%],F:0.1%,M:0.4%,n:1614 , the hap1 genome size is 2.0GB , the busco is C:99.2%[S:16.7%,D:82.5%],F:0.1%,M:0.7%,n:1614 ; the hap2 genome size is 1.7Gb, the buso is C:99.5%[S:18.3%,D:81.2%],F:0.1%,M:0.4%,n:1614 ; the hap3 genome size is 1.9Gb, busco is C:99.0% [S:22.1%,D:76.9%],F:0.3%,M:0.7%,n:1614 ; the hap4 genome size is 1.7Gb, the busco is C:99.3%[S:21.9%,D:77.4%],F:0.2%,M:0.5%,n:1614 . this is my command: hifiasm.193 -o 21ct -t 60 --n-hap 4 --h1 HIC876_raw_1.fq.gz --h2 HIC876_raw_2.fq.gz 01m64144_220505_071151.reads.fq.gz 02m64173_220505_081857.reads.fq.gz 03m64180_220512_043802.reads.fq.gz 04m64180_220513_153507.reads.fq.gz. I have checked the log of hifiasm, the homozygous read coverage threshold is right. What should i do next ? Can I use a tool like purge_dups?

If you have HiC reads, the latest release Hifiasm-0.19.3-r572 will give you 4 haplotypes. But the results might be not perfect right now.

Hi, when I was testing 0.19.3 I had a question, but I don't know if it's right? When I assemble a tetraploid, I use the '-l 0' parameter and the result produced by p_ctg.gfa should be a diploid size, and the hap1+hap2 produced after adding the hic data is also a tetraploid size. But when I set '--n-hap 4', the size of the resulting 4 hap file d is almost the same size as the p_ctg.gfa file. Shouldn't it be half the size of the p_ctg.gfa file here?

From my personal experience. If all your data are HIFIs data, then you can use the default parameters of this version 0.16.1 to get hap1+hap2, if the size of hap is double what you expect, you add the parameter ''-l 0''. Finally mount hap1+hap2 with hic data and then split out 4 sets of haplotype genomes.

hello,wyl1219, i have a question, you let me add the parameter ''-l 0''.When l0 is used, hifiasm produces pctg and actg. Do you combine pctg and actg and then run 3ddna?