Open wm75 opened 4 years ago
In addition, I think the consequence
property of OldSnpEff may be wrong, too.
The old EFF annotations are using +
, not &
, right?
Ah, OldSnpEff does not even have a self.effects['Transcript_BioType']
(hence the now failing test).
Also, SnpEff.coding seems to suffer from the same consequence
vs consequences
mix-up.
I could fix those, but maybe tell me if I'm completely wrong with my assumptions first.
yes. looks like you're right. not sure how this has escaped noticed for so long. that'd be great if you updated to include OldSnpEff.
Ok, so this was a bit more work than I intended originally, but I cleaned up the class hierarchy of Effect and its subclasses quite a bit and fixed a number of bugs, mostly around the detection of exonic and coding effects of variants:
The thing is though that after all this cleanup work it becomes apparent that most of it won't help Gemini because from https://github.com/arq5x/gemini/blob/5db2e52ae4aee413a8780df538565c90a38e4b11/gemini/gemini_load_chunk.py#L498 onwards Gemini only uses the top impact effect and its top consequence to populate is_exonic
, is_coding
, is_lof
and is_splicing
.
At least for is_exonic
and is_coding
this is clearly against what the documentation of the Gemini database scheme says. I'm just not sure which behavior, the documented or the implemented one, is the desired one.
If you want I can raise this as a separate issue in the Gemini repo.
hi, thanks for looking into this.
re the gemini note. I think it's doing the right thing. the top_impact
is stored in the variants table. and it will have the correct is_exonic
, etc for that chosen impact. all other impacts are stored in the variant_impacts table with is_exonic
et al set according to each impact.
yeah, I guess you can argue like that, which would mean that the docs need an update.
The counter argument would be one of practicality: what's documented - is_exonic = 1
if any effect on any transcript is an exonic one - is just very hard to derive from the variant_impacts table. Haven't actually verified that it's possible at all.
A Gemini user on usegalaxy.eu reported yesterday that gemini's is_exonic column is wrongly populated for SnpEff-annotated composite effects. I think this simple typo here is the reason.