Notation on gene level regarding existance of pseudogenes?

Clinical-Genomics / scout

VCF visualization interface

https://clinical-genomics.github.io/scout

BSD 3-Clause "New" or "Revised" License

150 stars 46 forks source link

Notation on gene level regarding existance of pseudogenes? #868

Closed KickiLagerstedt closed 5 years ago

KickiLagerstedt commented 6 years ago

Regarding some genes / some regions of genes annotation could be false positive / difficult due to the presence of duplicated region. Would it be possible to annotate this on the gene page or something like that?

For inspiration see list here https://blueprintgenetics.com/pseudogene/

// Kicki

dnil commented 6 years ago

One solution would be to view a mappability score, e.g. one of http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability

Another perhaps just to instruct chanjo / sambamba not to count reads with MAPQ == 0 towards coverage, I e value non-uniquely mapping reads as little as not existing ones.

dnil commented 6 years ago

..or we could provide a link to the good old seg_dup track from the Eichler lab - which appears to be exactly what BluePrint Genetics has intersected with.

moonso commented 6 years ago

@KickiLagerstedt @dnil Could you elaborate on this? Where would the information be to help analysis? If chanjo we should open the issue there instead...

henrikstranneheim commented 6 years ago

MIP is using this filter with sambamba:

--filter 'mapping_quality >= 10 and not duplicate and not failed_quality_control'

So we are already not counting poorly mapped reads.

moonso commented 6 years ago

Ok so please close this or make it more clear what should be done

dnil commented 6 years ago

How about we settle for flagging the Eichler segmental duplication regions for now: that together with the Chanjo coverage should give a decent picture? The UCSC genomicSuperDups that is. @henrikstranneheim is that annotated somewhere already, or should we try to query automatically? BluePrint settled on a static page with a list of regions, but personally I would prefer a flag on the variant page..

henrikstranneheim commented 5 years ago

Available in MIP 7.0 as: VCF key in CSQ field: "genomic_superdups_frac_match". For a variant found in overlapping multiple segdups the annotation uses "&" to split each annotation record within the field.

Should be displayed at the variant level even though the information is present in each transcript in the CSQ field.

dnil commented 5 years ago

We will consider this solved with #1253 for now. It is not unlikely that someone will wish for aggregate lists from this feature, perhaps in combination with average (expected) mapped coverage for genes on a panel, but that may be better solved by e g Chanjo development.