exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
195 stars 54 forks source link

Refine TAD non-coding assignment model #132

Closed visze closed 7 years ago

visze commented 8 years ago

GeneReassigner works only on variants in "regulatory regions". But it does not reassign other non-coding variants in a TAD. Especially intronic variants. These variants should be reassigned:

julesjacobsen commented 8 years ago

This is true, but before variants are run through the GeneReassigner they are run through the setRegulatoryRegionVariantEffect function in AbstractAnalysisRunner (line 233) this will reassign the variant effect from INTERGENIC_VARIANT or UPSTREAM_GENE_VARIANT to REGULATORY_REGION_VARIANT. The reassignment is based on whether ot not the variant falls inside a known FANTOM enhancer.

visze commented 8 years ago

Yes I know. But the fantom things is crap. Why we should create a score (remm) over the complete genome but only allowing the variants to be at obvious sites. So there is no need of the FANTOM thing.

E.g. The overlap of fantom with a specific limp enhancer track is <1%. This shows that limiting the "regulatory" variants to FAMTOM is useless. It is better to limit the variants to remm >0.5

I implemented a draft version where all variants are reassigned that fall into the non-coding varianteffects. it looks much better on our genomes now, because candidates, found by other approaches, are now present.

julesjacobsen commented 8 years ago

So yeah, this is somewhat coupled and not obvious. We're really trying to reassign variants in known enhancer regions to the best phenotypic match gene for all genes in that TAD. However I don't think we want to be too over-zealous with this as we will end up over-assigning variants to genes they have nothing to do with. That said, we could probably widen the reassignment to variants in the DOWNSTREAM_GENE_VARIANT and CONSERVED_INTERGENIC_VARIANT too. @damiansm what were your criteria for only using the INTERGENIC_VARIANT and UPSTREAM_GENE_VARIANT effects?

visze commented 8 years ago

Well the thing is that I try to convince our physicians to use Genomiser on our genomes. But they say, and I agree with them, that any variant that somehow regulatory (conserved, acetylathed,... => large remm score) in a TAD with a gene matching the phenotype is of interest.

So right now do not want to use genomiser because of that issue. Right now they are looking at de-novos only and searching for a good gene in the same TAD.

The "overassignment" works very well on my test. So why should we do not do it?

DOWNSTREAM_GENE_VARIANT (or upsteam) is maybe to close to the gene. And why not intronic? This can also be an enhancer.

damiansm commented 8 years ago

Think we should have a major discuss before we start messing with this filter. It is not just FANTOM, it is Ensembl regulatory build as well that we use. The geneReassigner is doing a different task as Jules described.

Max - sound like you are suggesting filtering variants by ReMM score rather than regulatory features. Before we change things I would like to see some hard evidence e.g. benchmarking, rather than just opinions. Suspect keeping all variants with RemM > 0.5 is not going to filter that much and leave too many candidates. FANTOM+Ensembl regulatory features did a pretty good job of filtering and prioritisiing the curated variants.

On Tue, May 31, 2016 at 11:23 AM, Max notifications@github.com wrote:

Well the thing is that I try to convince our physicians to use Genomiser on our genomes. But they say, and I agree with them, that any variant that somehow regulatory (conserved, acetylathed,... => large remm score) in a TAD with a gene matching the phenotype is of interest.

So right now do not want to use genomiser because of that issue. Right now they are looking at de-novos only and searching for a good gene in the same TAD.

The "overassignment" works very well on my test. So why should we do not do it?

DOWNSTREAM_GENE_VARIANT (it can also be upstream) is maybe to close to the gene.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/132#issuecomment-222649888, or mute the thread https://github.com/notifications/unsubscribe/AE7uPMglfQzEhi2WYDaBgoTITQZ8WCiLks5qHAw3gaJpZM4IqP7j .

visze commented 8 years ago

Damian you are right. It is not only fantom. Maybe we can have two different options. A "stringend" (as is it right now, maybe adding intronic and downstream variants to check if they are in a regulatory feature. I do not understand why did you exclude them. ) and a "loose" one (do not use fantom+ensembl).

Right now we did not solve any of our genomes with the Genomiser (ok the GLI2 case, but because of a missense variant). In the rest of the cases the resulting genes/variants do not match at all to the phenotype (exomiser score only around 0.5 -0.6). So it would be nice to to make a wider search.

damiansm commented 8 years ago

My brain is a bit fuzzy on all this and the code is a bit convoluted on how all this works but you could just remove the regulatoryFeatureFilter in the yml config?

Note the regulatoryFeatureFilter is not quite as simple as remove any non-coding variant if it is not in fantom/ensembl. It also keeps ALL variants within 25kb of the gene. Or was it 50kb? This feature may not be obvious in the actual method that is doing the filtering!

On Tue, May 31, 2016 at 12:48 PM, Max notifications@github.com wrote:

Damian you are right. It is not only fantom. Maybe we can have two different options. A "stringend" (as is it right now, maybe adding intronic and downstream variants to check if they are in a regulatory feature. I do not understand why did you exclude them. ) and a "loose" one (do not use fantom+ensembl).

Right now we did not solve any of our genomes with the Genomiser (ok the GLI2 case, but because of a missense variant). In the rest of the cases the resulting genes/variants do not match at all to the phenotype (exomiser score only around 0.5 -0.6). So it would be nice to to make a wider search.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/132#issuecomment-222665702, or mute the thread https://github.com/notifications/unsubscribe/AE7uPNbgjoeqNeXZWjI3pL42KPUXy5Gxks5qHCAMgaJpZM4IqP7j .

damiansm commented 7 years ago

Current nc variant behaviour is: (1) Any variants found in the regulatory_regions db table of known regulatory regions from FANTOM and Ensembl regulatory build AND an effect of INTERGENIC_VARIANT or UPSTREAM_GENE_VARIANT get the effect changed to REGULATORY_REGION_VARIANT. This is to stop them getting removed in step 3 (? if Jannovar ever assigns this effect - I don't think so) (2) Any variants with an effect of REGULATORY_REGION_VARIANT get reassigned to the gene in the TAD with the best pheno score (3) The regulatoryFeature filter removes any variant with an effect of INTERGENIC_VARIANT or UPSTREAM_GENE_VARIANT AND >= 20kb away from gene

Max's preferred behaviour (1) Reassign variants to best gene in TAD for most nc variants

Peter's preferred display behaviour (1) More detail on what REGULATORY_REGION_VARIANT means and/or a better name as the other types are regulatory variants as well. Could we link to a suitable external resource such as Ensembl regulatory build? (2) Provide a more detailed breakdown and viz of the various UTR effects such as upstream ORFs, KOZAK etc

julesjacobsen commented 7 years ago

Hang on what's happening here? #142, #143 and this one are all no longer issues or is there a new issue relating to this?

damiansm commented 7 years ago

I created a one new issue merging all the info/discussions in these. It was getting too messy

On Thu, Jul 20, 2017 at 3:48 PM, Jules Jacobsen notifications@github.com wrote:

Hang on what's happening here? #142 https://github.com/exomiser/Exomiser/issues/142, #143 https://github.com/exomiser/Exomiser/issues/143 and this one are all no longer issues or is there a new issue relating to this?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/132#issuecomment-316726929, or mute the thread https://github.com/notifications/unsubscribe-auth/AE7uPKqS49eQ4_T0udE0m5KgHQ6COUb5ks5sP2i8gaJpZM4IqP7j .

damiansm commented 7 years ago

see #219 for this now

julesjacobsen commented 7 years ago

Ah, good. Even the simplification was confusing!