BRCAChallenge / brca-exchange

Overall management and deployment of the BRCA Exchange web portal and pipeline scripts
http://brcaexchange.org
28 stars 32 forks source link

Solve ambiguity for deNovoPrior = 0.02 in ISPP Sandbox #818

Open amycoffin opened 6 years ago

amycoffin commented 6 years ago

splicePrior can equal 0.02 for more than one reason:

This ambiguity affects the Splice-level estimation nested tile, the de novo donor tab.

amycoffin commented 6 years ago

according to calcVarPriors.py, "frameshiftFlag: equals 1 if variant causes a frameshift, 0 if not". However, there are also "-" and "N/A" values assigned for that flag. It appears that all insertions, deletions, and indels are currently "-", which means that frameshiftFlag might be more useful once Zack implements the indel side of the code.

Intuitively, I believe that the part of the code that assigns frameshiftFlag should be this simple:

if varType == substitution, then frameShiftFlag = 0 if varType == insertion, deletion, or indel and getDeNovoSpliceFrameshiftStatus returns True, then frameshiftFlag = 1

(currently, I believe that N/A is assigned for substitution variants, per line 1954; however, per output there are multiple substitution variants that have frameshiftFlag values of N/A, "-", 0, and 1. It is impossible for a substitution variant to be a frameshift, so I am not sure whether that flag is reliable. Maybe I am misunderstanding it.)

calcVarPriors.py appears to currently handle the frameshiftFlag on a case-by-case basis, while the function "getDeNovoSpliceFrameshiftStatus" provides a T/F on frameshift. Can we simply put the frameshiftFlag at the end fo that function? The only problem is that the frameshift flag is embedded throughout the code in conditionals, changing between 0,1, and N/A often. I assume it plays some important role if it shows up 86 times and is used in the flow control of the code. IMHO, frameshiftFlag ought to always be applicable, because a variant always is or is not a frameshift. If we were to do this, the goal for that flag would be for it to read 0 for all the currently implemented substitution variants, and "-" until the indels are implemented. Once indels are implemented, they will mostly be frameshiftFlag = 1, and more rarely =0.

amycoffin commented 6 years ago

Once issues #831 and #832 are addressed, we can implement a frameshift flag either using getVarType() or getVarConsequnces(), or perhaps a combination of the two.

amycoffin commented 6 years ago

@falquaddoomi has this been resolved, without updates to the frameshift flag?

amycoffin commented 6 years ago

@melissacline is it true that for all substitution variants, an IFD is not possible? So, given the current data release, can we default to Weak/Null because we have not implemented indels yet?

melissacline commented 6 years ago

@amycoffin I think we're safe in saying that if a variant is a substitution variant, then it's not an in-frame deletion. What is it again that you'd like to default to Weak/Null?

amycoffin commented 6 years ago

@melissacline The ambiguity mentioned in this issue is shown below, which is either weak/null or IFD, and if IFDs are not present, then weak/null can theoretically be defaulted to:

screen shot 2018-08-28 at 7 21 28 pm

Thus if the de novo donor prior = 0.02, we can default to weak/null for this first data set. We will still have to resolve this eventually.

melissacline commented 6 years ago

@falquaddoomi and @zfisch, when I've looked at this issue with fresh eyes, I've decided that if hte splicing prior is really, really low, then most users are not going to care which situation gave rise to the low prior. I doubt this will be an issue for this release, since the de novo splice site with the innocuous IFD is a rare situation. For subsequent releases, it's okay with me if we merge the two boxes, e.g. "Weak/Null & Low OR Innocuous IFD". That's one option, if it simplifies life.

zfisch commented 6 years ago

@melissacline @falquaddoomi @amycoffin what's our status on this issue?

amycoffin commented 5 years ago

@falquaddoomi @zfisch last we discussed, this was an okay bug for now, but are we working on getting these ambiguities finally resolved?

zfisch commented 5 years ago

Looking at @melissacline's previous comment --

For subsequent releases, it's okay with me if we merge the two boxes, e.g. "Weak/Null & Low OR Innocuous IFD". That's one option, if it simplifies life.

@falquaddoomi Do you need more information to implement?

falquaddoomi commented 5 years ago

@zfisch No, that should do it. I'll merge the two categories ASAP. I presume the same applies to #816.