Update csq impact lists with v105 splice terms

mike-w-wilson commented 5 months ago

Adds: splice_donor_5th_base_variant, splice_donor_region_variant , and splice_polypyrimidine_tract_variant

mike-w-wilson commented 5 months ago

@ch-kr , thank you! I've moved the transcript amplification to high. I did not move the other two into high based on your comment. For things that are now in modifier, our code base uses CSQ_NON_CODING to describe this section but these new members seem to contradict that. Thoughts on keeping these out for now? Seems like we need to decide if we want to imitate VEP rankings completely or give ourselves some room for adjustment?

ch-kr commented 5 months ago

yeah, I agree we need to decide whether to imitate VEP or just loosely follow their mapping with some adjustments. I was initially thinking we'd stay consistent with VEP, but I don't think that actually serves us in our potential downstream applications using these groups, so I vote we do the latter (use all of VEP's listed consequence terms and adjust the associated impacts where needed).

maybe we should move start_lost and transcript_amplification back to medium, keep feature_elongation/feature_truncation/coding_sequence_variant where they are currently (non-coding for the first two and low for the last one), add coding_transcript_variant to CSQ_CODING_LOW_IMPACT , and add sequence_variant to CSQ_NON_CODING?

^I know this is more complex than the initial PR, so I'd be happy to merge this PR and start the discussion via slack to finalize which terms should go where

mike-w-wilson commented 5 months ago

@ch-kr That sounds good to me. I've made updates to the PR. Would you still like to start the slack discussion?

ch-kr commented 5 months ago

let's merge -- I have a meeting with Kaitlin next week and will plan to ask her about these then (and move into a larger public channel if needed)

broadinstitute / gnomad_methods

Update csq impact lists with v105 splice terms #711