Closed KickiLagerstedt closed 5 years ago
One solution would be to view a mappability score, e.g. one of http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability
Another perhaps just to instruct chanjo / sambamba not to count reads with MAPQ == 0 towards coverage, I e value non-uniquely mapping reads as little as not existing ones.
..or we could provide a link to the good old seg_dup track from the Eichler lab - which appears to be exactly what BluePrint Genetics has intersected with.
@KickiLagerstedt @dnil Could you elaborate on this? Where would the information be to help analysis? If chanjo we should open the issue there instead...
MIP is using this filter with sambamba:
--filter 'mapping_quality >= 10 and not duplicate and not failed_quality_control'
So we are already not counting poorly mapped reads.
Ok so please close this or make it more clear what should be done
How about we settle for flagging the Eichler segmental duplication regions for now: that together with the Chanjo coverage should give a decent picture? The UCSC genomicSuperDups that is. @henrikstranneheim is that annotated somewhere already, or should we try to query automatically? BluePrint settled on a static page with a list of regions, but personally I would prefer a flag on the variant page..
Available in MIP 7.0 as: VCF key in CSQ field: "genomic_superdups_frac_match". For a variant found in overlapping multiple segdups the annotation uses "&" to split each annotation record within the field.
Should be displayed at the variant level even though the information is present in each transcript in the CSQ field.
We will consider this solved with #1253 for now. It is not unlikely that someone will wish for aggregate lists from this feature, perhaps in combination with average (expected) mapped coverage for genes on a panel, but that may be better solved by e g Chanjo development.
Regarding some genes / some regions of genes annotation could be false positive / difficult due to the presence of duplicated region. Would it be possible to annotate this on the gene page or something like that?
For inspiration see list here https://blueprintgenetics.com/pseudogene/
// Kicki