Open edg1983 opened 2 years ago
Hi @edg1983,
yes, it is correct, that there are no values between 1.0 and 0.5. The mappability value is the multiplicative inverse of the number of occurrences of a k-mer. A value of 1.0 means it is unique in the genome, 0.5 means it occurs twice, and 0.33 means it occurs three times in the genome.
So your assumption is correct: lower values represent regions that are more repetitive, hence more difficult to map.
I don't have a magic threshold number, but the section on Mappability and SNP calling might be of interest for you.
Christopher
Hi,
I've used your pre-compiled index files to compute mappability with
-K 150
assuming this is a good approach to compute expected mappability for 150bp reads sequencing (I've tried also-K 100
and-K 75
and the considerations below still valid).In the resulting BED file, I see that computed values have a range 0-0.5 or 1, with no values between 0.5 and 1. Is this expected? Are the output values actual mappability values so lower values correspond to regions difficult to map? In this case, why there are no values between 0.5 and 1?
If low values are associated with mapping problems and the computed values are correct (thus most values are < 0.5), any suggestion on a threshold to define difficult-to-map regions for variant filtering?
Thanks!
Edoardo