dfguan / purge_dups

haplotypic duplication identification tool
MIT License
206 stars 20 forks source link

Need to update or clarify minimap preset for hifi #113

Open Astahlke opened 2 years ago

Astahlke commented 2 years ago

Hi, I can not see which parameter you are using for alignment, but it should be -xasm20 suggested by Hengli here lh3/minimap2#325. The high cutoff (90) is a little bit small, may try 160 (2.5 * 64). May I ask why so many bases have very low coverage? Is it due to the alignment? Best, Dengfeng.

Just to check after some more updates in minimap2: if i have hifi reads, for purge_dups step 1 paf generation, it would be best to use -map-hifi, not -map-pb as in README or -xasm20 as above, correct?

_Originally posted by @bishopia in https://github.com/dfguan/purge_dups/issues/30#issuecomment-966261921_

Hi Dengfeng,

Since minimap v2.19 (https://github.com/lh3/minimap2/releases/tag/v2.19), the recommended preset for mapping hifi reads to a reference is -map-hifi, not -xasm20. Reviewing the documentation on the differences, it's clear that the presets are different, but not yet clear to me exactly how these differences would impact purge_dups behavior. Do you have any insight?

map-hifi Align PacBio high-fidelity (HiFi) reads to a reference genome (-k19 -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1 -s200).
asm20 Long assembly to reference mapping (-k19 -w10 -U50,500 --rmq -r100k -g10k -A1 -B4 -O6,26 -E2,1 -s200 -z200-N50). Up to 20% sequence divergence.

On the other hand, maybe purge_dups could be updated to properly process the .paf.gz of -map-hifi? The purge_dups pbcstat program does not correctly process the output of those alignments.

Maybe it makes little difference, but it's a bit confusing when the minimap documentation conflicts with that of purge_dups.

Very interested in your thoughts and whether/how purge_dups could be updated. Thank you!

mason-linscott commented 1 year ago

Hi all,

I would like to second the above posters recommendations on clarifying the minimap2 presets in the tutorial.

I have tested both settings out for my hifi dataset and get nearly identical results (at least for coverage cutoffs). Only thing to note is that the minimap2 main page specifies to use the '-ax map-hifi' flag for hifi data which will create a sam file as opposed to a paf file with '-x map-hifi'.

Thanks for making such a great tool! Mason