ChristopherWilks / megadepth

BigWig and BAM utilities
Other
91 stars 9 forks source link

Filter for minimum MAPQ when processing BAM #11

Closed willhooper closed 2 years ago

willhooper commented 2 years ago

Hi,

I'm trying to count reads in 1kb bins across a BAM. Is it possible to filter reads for minimum MAPQ? I see the parameter min-unique-qual but it doesn't seem to affect the results.

Thanks, Will

ChristopherWilks commented 2 years ago

Hi @willhooper,

Could you post the exact megadepth command you're running? That'll help me debug this.

Thanks, Chris

ChristopherWilks commented 2 years ago

If you're using --annotation <window_size> setting, the minimum MAPQ won't have any effect as you've found, as that feature is not currently intended to be a general filter but is only for specific situations when the normal coverage/annotated regions are also being calculated with the idea of providing a way to get counts of uniquely mapping reads only.

I may eventually change that to be a more general filter parameter like you'd want, but it requires some non-trivial refactoring so it won't happen in the immediate future.

willhooper commented 2 years ago

Ah okay -- yes, I was using the --annotation <window_size> parameter. Would the behavior change if I passed a BED file instead of a window size?

ChristopherWilks commented 2 years ago

@willhooper sorry for the late response, my life got very crazy in the last few weeks.

The behavior does change if you pass a BED file to --annotation and have --min-unique-qual set, however to get reasonable output (and admittedly this is lame and something I should change), you need to ensure that the output from the annotation parameter doesn't get written to stdout otherwise you'll get double the lines you expect, so here's an example of the right way to do it currently (forcing --annotation to write to a file with --no-annotation-stdout):

megadepth file.bam --threads 4 --min-unique-qual 10 --annotation annotation.bed --no-annotation-stdout

that will write 2 files: file.bam.annotation.tsv file.bam.unique.tsv (this is the filtered by qual value that you want)

This is odd I know, but it's a hold over for what the --min-unique-qual flag was originally meant for (not general purpose).

You should see a difference in values for the same intervals in those 2 files based on your --min-unique-qual setting. Let me know if you see unexpected results from that.

willhooper commented 2 years ago

Great, thanks for your help! I'll give this a shot.