BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
179 stars 13 forks source link

minimap2 ava-ont option #8

Closed dcopetti closed 4 years ago

dcopetti commented 4 years ago

Hello,

I wonder why you are using the ava-ont option of minimap2 MINIMAP2_PARAM=" -x ava-ont when aligning raw long reads to a very accurate sequence like a genome assembly

Why not to use the map-ont preset instead? I am just curious on whether we could have more specificity by choosing the second option. Thanks,

Dario

adonis316 commented 4 years ago

Hi Dario, Yes, the preset ava-ont is tuned for the noisy long-read mapping with more low-quality outputs while map-ont for long reads against a reference-quality sequence. We tested both presets and ava-ont did give us more alignments with lower identity ratio.

In the pipeline, contigs are aligned to ont reads to get the candidate long-read fragments for each gap region. Due to the LOW coverage of long reads, we would like to involve more candidates to improve the EFFICIENCY. The number of candidates is the first priority in this step. That is the reason why we choose ava-ont. Then we polish them and choose the best one, and at that time, we focus more on accuracy instead.

If you have sufficient coverage of long reads, then map-ont might give you a better gap-closing result in terms of accuracy. It can be easily replaced in the TGS-GapCloser.sh. But we have not had a chance to try that.

Thanks, Mengyang Xu

dcopetti commented 4 years ago

Hi Mengyang,

Thank you for the detailed answer. Approximately, what do you mean for low and sufficient coverage of long reads? E.g. I have about 16x of reads longer than 20 kb (N50 45 kb) and QV>8: do you think these features are enough to try the map-ont option? Thanks, Dario

adonis316 commented 4 years ago

Hi Dario, We have tried 10x coverage of ont/pacbio raw reads, and the number of alignments using ava-ont/ava-pb is ~three times larger than that using map-ont/map-pb. We have also tired different coverages of long reads and found that the number of finally closed gaps keeps increasing with the coverage ranging from 1x to 20x. But it is still hard to tell if map-ont is better.

If your genome is small, then you are free to try both presets. If not, you can still compare two presets in a fast way: 1. check the number of alignments using minimap2; 2. check the contig n50 after gap closure without error correction (it takes a few hours for a 3g genome with 10x coverage of long reads).

It would be great if you can share your findings.

Thanks, Mengyang