chhylp123 / hifiasm

Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
MIT License
505 stars 84 forks source link

how to make the correct genome size estimation for allotetraploid species? #634

Open RezwanCAAS opened 3 months ago

RezwanCAAS commented 3 months ago

Hi, I assembled the genome of allotetraploid species using hifiasm with size of ~3.7gb. I used the PacBio HiFi reads in Merqury for kmer analysis of genome estimation of our allotetraploid species. I have shared the figure of genomescope plot, which showing the size of 1.5gb. I am astonished what's wrong here. Can someone guide me in this?

Second question is why the observed peaks are going out of the model peak? I shall be grateful to you.

Regards Rezwan linear_plot

chhylp123 commented 3 months ago

How did you run hifiasm? A good k-mer plot should be: https://hifiasm.readthedocs.io/en/latest/faq.html#why-does-hifiasm-stuck-or-crash. If only primary assembly is required, you could have a try to run purge_dups after hifiasm assemmbly.

RezwanCAAS commented 3 months ago

Sorry for the late reply because I was traveling and busy with field experiment. I used these commands for the assembly

1st command

module load hifiasm/0.19.8
hifiasm -o yellow_assembly -t 32 --hom-cov 63 \
 --h1 yellow_1.fastq.gz \
 --h2 yellow_2.fastq.gz \
 reads_cell_*

output

-rw-r--r-- 1 tariqr ibex-c2141 44943554304 Mar  2 02:38 yellow_assembly.ec.bin
-rw-r--r-- 1 tariqr ibex-c2141  3020953966 Mar 25 17:29 yellow_assembly.hic.hap1.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3083618349 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16185036 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    62763143 Mar  2 10:52 yellow_assembly.hic.hap1.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3603444541 Mar 25 17:30 yellow_assembly.hic.hap2.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3680868301 Mar  2 10:53 yellow_assembly.hic.hap2.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16712429 Mar  2 10:54 yellow_assembly.hic.hap2.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    77494725 Mar  2 10:53 yellow_assembly.hic.hap2.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3358681400 Mar  2 10:04 yellow_assembly.hic.lk.bin
-rw-r--r-- 1 tariqr ibex-c2141  3728413366 Mar 25 17:31 yellow_assembly.hic.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3807131425 Mar  2 04:21 yellow_assembly.hic.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16786239 Mar  2 04:22 yellow_assembly.hic.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    78785721 Mar  2 04:21 yellow_assembly.hic.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  7065869776 Mar  2 04:16 yellow_assembly.hic.p_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    36327989 Mar  2 04:18 yellow_assembly.hic.p_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   141553288 Mar  2 04:17 yellow_assembly.hic.p_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  8681089843 Mar  2 04:12 yellow_assembly.hic.r_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    47969038 Mar  2 04:14 yellow_assembly.hic.r_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   156694833 Mar  2 04:13 yellow_assembly.hic.r_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141 50678500976 Mar  2 06:18 yellow_assembly.hic.tlb.bin
-rw-r--r-- 1 tariqr ibex-c2141 29932238864 Mar  2 03:49 yellow_assembly.ovlp.reverse.bin
-rw-r--r-- 1 tariqr ibex-c2141 20184090104 Mar  2 03:02 yellow_assembly.ovlp.source.bin

2nd command

module load hifiasm/0.19.8

hifiasm -o yellow_assembly -t 32 -s 0.30 -D 10 \
 --h1 yellow_1.fastq.gz \
 --h2 yellow_2.fastq.gz \
 reads_cell_*

output

-rw-r--r-- 1 tariqr ibex-c2141  4039031425 Feb 29 12:02 yellow_assembly.hic.hap1.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  4124692575 Feb 28 07:17 yellow_assembly.hic.hap1.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    19638440 Feb 28 07:18 yellow_assembly.hic.hap1.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    85772645 Feb 28 07:18 yellow_assembly.hic.hap1.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  2589525069 Feb 29 12:03 yellow_assembly.hic.hap2.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  2643948816 Feb 28 07:18 yellow_assembly.hic.hap2.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    13358259 Feb 28 07:19 yellow_assembly.hic.hap2.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    54482262 Feb 28 07:18 yellow_assembly.hic.hap2.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  3348883208 Feb 28 06:28 yellow_assembly.hic.lk.bin
-rw-r--r-- 1 tariqr ibex-c2141  3619840841 Feb 29 12:01 yellow_assembly.hic.p_ctg.fasta
-rw-r--r-- 1 tariqr ibex-c2141  3696066233 Feb 28 01:27 yellow_assembly.hic.p_ctg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    16596961 Feb 28 01:27 yellow_assembly.hic.p_ctg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141    76289876 Feb 28 01:27 yellow_assembly.hic.p_ctg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  7100699338 Feb 28 01:23 yellow_assembly.hic.p_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    36971704 Feb 28 01:24 yellow_assembly.hic.p_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   141880425 Feb 28 01:23 yellow_assembly.hic.p_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141  8644503429 Feb 28 01:20 yellow_assembly.hic.r_utg.gfa
-rw-r--r-- 1 tariqr ibex-c2141    48118375 Feb 28 01:22 yellow_assembly.hic.r_utg.lowQ.bed
-rw-r--r-- 1 tariqr ibex-c2141   156440319 Feb 28 01:21 yellow_assembly.hic.r_utg.noseq.gfa
-rw-r--r-- 1 tariqr ibex-c2141 50669605160 Feb 28 03:46 yellow_assembly.hic.tlb.bin
-rw-r--r-- 1 tariqr ibex-c2141 35754767954 Feb 28 00:57 yellow_assembly.ovlp.reverse.bin
-rw-r--r-- 1 tariqr ibex-c2141 20290235360 Feb 28 00:40 yellow_assembly.ovlp.source.bin

I used 1st command output file for making kmer analysis with merqury. Please check the result and let me know some great suggestions. Moreover, I want to add here that the parents of polyploid species have high homology.