Nextomics / NextPolish2

Repeat-aware polishing genomes assembled using HiFi long reads
Other
63 stars 3 forks source link

Why no any improvement of QV is achieved using NextPolish2? #3

Closed bioinformaticspcj closed 1 year ago

bioinformaticspcj commented 1 year ago

Dear the authors,

Thanks for developing such a useful tool to polish the results of Hifiasm. I try to polish my Hifiasm's assembly (assembled with 40 × hifi reads) but found the QV of the polished assembly is as the same as the unpolished one. The species genome is highly heterozygous (rate: 0.77%). I do not know why? Could you be kind to help me?

The commands used are as follows: hifiasm -t 60 -o Pvat -l 2 -s 0.75 --h1 R1.fq.gz --h2 R2.fq.gz hifi.fastq.gz winnowmap -k 21 -t 100 -W repetitive_k21.txt -ax map-pb Pvat.hic.pctg.fa hifi.fastq.gz |samtools sort -@ 100 -o hifi.map.sort.bam - yak count -t 100 -o k21.yak -k 21 -b 37 <(zcat illumina.fq.gz) yak count -t 100 -o k31.yak -k 31 -b 37 <(zcat illumina_.fq.gz) nextPolish2 -r hifi.map.sort.bam Pvit.hic.p_ctg.fa k21.yak k31.yak -t 100 -o Pvat.hic.p_ctg.polished.v1.fa

The QV before polishing: 45.1437

The QV after polishing: 45.1451

Thanks again! Bob

moold commented 1 year ago

Could you share the raw data (assembly.fa illumina_.fq.gz hifi.fastq.gz) to me? so I can figure out why? In addition, you can map the illumina reads to the assembly, and then check the mapping coverage/quality of regions in the assembly with error kmers. If these regions do not have short reads mapped, nextPolish2 can not polish them because nextPolish2 requires short reads to validate kmes.

bioinformaticspcj commented 1 year ago

Thanks for your timely reply. Given the illumina_R*.fq.gz files and the hifi.fastq.gz file are too big (in total more than 100Gb in size), I am afraid I could not share the files with you. I will try to see how the short reads map the assembly and provide you with the results soon.

------------------ 原始邮件 ------------------ 发件人: "Nextomics/NextPolish2" @.>; 发送时间: 2023年5月6日(星期六) 中午1:56 @.>; @.**@.>; 主题: Re: [Nextomics/NextPolish2] Why no any improvement of QV is achieved using NextPolish2? (Issue #3)

Could you share the raw data (assembly.fa illumina_.fq.gz hifi.fastq.gz) to me? so I can figure out why? In addition, you can map the illumina reads to the assembly, and then inspect the mapping coverage/quality of the error kmer in the assembly.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

moold commented 1 year ago

OK

bioinformaticspcj commented 1 year ago

Dear the author,

I have checked the illumina reads and the error kmers by mapping the reads to the assembly. The results showed about 49% of error kmers have no reads covered and only about 0.05% error kmers have > 10 ×reads covered with MAQ> 10. Do the results indicate the reads are of low quality and can not be used to polish the assembly?

Thanks  Bob

------------------ 原始邮件 ------------------ 发件人: "Nextomics/NextPolish2" @.>; 发送时间: 2023年5月6日(星期六) 下午4:31 @.>; @.**@.>; 主题: Re: [Nextomics/NextPolish2] Why no any improvement of QV is achieved using NextPolish2? (Issue #3)

OK

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

moold commented 1 year ago

No, but the improvement will not be great. becasue nextPolish2 will keep ref. unchanged if it cannot confirm that the candidate kmer is correct to avoid potential overcorrections, BTW: You can use PCR-free library.

bioinformaticspcj commented 1 year ago

Thanks for your valuable advice,   I have a question that if I could use MGI reads to correct the kmer? MGI reads are all sequenced using PCR-free library.

Best, Bob

------------------ Original ------------------ From: "Nextomics/NextPolish2" @.>; Date: Tue, May 9, 2023 04:50 PM @.>; @.**@.>; Subject: Re: [Nextomics/NextPolish2] Why no any improvement of QV is achieved using NextPolish2? (Issue #3)

No, but the improvement will not be great. becasue nextPolish2 will keep ref. unchanged if it cannot confirm that the candidate kmer is correct to avoid potential overcorrections, BTW: You can use PCR-free library.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

moold commented 1 year ago

yes

wangsb111 commented 1 year ago

@bioinformaticspcj hello, did you resolved this problem now? i met the same problem.

bioinformaticspcj commented 1 year ago

Dear Dr. Hu:

I have tried the PCR free reads generated from T7 platform. However, I find the qv is still not improved a lot (only from 45.1644 to 45.3739). Could you  give me some other suggestions?

Best, Bob

------------------ 原始邮件 ------------------ 发件人: "Nextomics/NextPolish2" @.>; 发送时间: 2023年5月19日(星期五) 晚上9:02 @.>; @.**@.>; 主题: Re: [Nextomics/NextPolish2] Why no any improvement of QV is achieved using NextPolish2? (Issue #3)

yes

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

moold commented 1 year ago

You need to first check the mapping coverage/quality in the regions of the assembly with error kmers. If these regions has no short reads mapped, NextPolish2 can not be used to improve QV.

wangsb111 commented 1 year ago

@bioinformaticspcj did you got the QV by nextpolish2 or the merqury ?