Open xyw1 opened 3 years ago
Hello, Thanks for your interest and for closely looking at ipdSummary.py. I would like to try to help answer your question.
The objective function in the optimization is mixModelFn, defined on line 81 of MixtureEstimationMethods.py:
def mixModelFn(self, p, a0, a1): tmp = (1 - p) a0 + p a1 return -np.log(tmp[np.nonzero(tmp)]).sum()
The input arguments are: p = methylated fraction a0 = value of Gaussian pdf for un-methylated IPDs: N( mu0, sd0 ) a1 = value of Gaussian pdf for methylated IPDs: N( mu1, sd1 )
When fminbound is used to do single-variable optimization for a function like mixModelFn, the optimization is done only over the first input argument – in this case, p. The remaining arguments (a0 and a1) are passed to fminbound as fixed values.
Now, here is estimateSingleFraction, defined in line 96 of MixtureEstimationMethods.py:
The input arguments are: mu1 = mean of log IPDs for methylated case data = observed log IPDs mu0 = mean of log IPDs for un-methylated case L = number of observations in the input data optProp = if set to True, returns optimal mixing proportion.
def estimateSingleFraction(self, mu1, data, mu0, L, optProp=True):
# NOTE: ignoring the warnings here is sloppy, should be looked at later.
with np.errstate(all="ignore"):
a0 = self.replaceScipyNormPdf(data, mu0)
a1 = self.replaceScipyNormPdf(data, mu1)
# if f'(0) < 0 (equ. a1/a0 < L), then f'(1) < 0 as well and
# solution p-hat <= 0
if np.divide(a1, a0).sum() <= L:
res = 0.0
# if f'(1) > 0 (equ. a0/a1 < L), then f'(0) > 0 as well and
# solution p-hat >= 1
elif np.divide(a0, a1).sum() <= L:
res = 1.0
else:
# unconstrained minimization of convex, single-variable function
res = fminbound(self.mixModelFn, 0.01, 0.99,
args=(a0, a1), xtol=1e-02)
if optProp:
# return the optimal proportion
return res
else:
# return the corresponding log likelihood function value
return self.mixModelFn(res, a0, a1)
Does this help at all? Please feel free to write back if not.
Thanks again for your interest in this,
From: xuyw notifications@github.com Sent: Monday, November 30, 2020 2:49 AM To: PacificBiosciences/kineticsTools kineticsTools@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [PacificBiosciences/kineticsTools] Is there a mistake in script MixtureEstimationMethods.py? (#81)
[EXTERNAL MESSAGE] Be mindful when clicking links or attachments
Hi authors, I was interested in how ipdSummary gives out the frac value, so I read the source code and I was confused about this function:
[Image removed by sender. image]https://user-images.githubusercontent.com/28685819/100600251-02070180-333c-11eb-81d2-43c4f378619b.png
Why estimateSingleFraction is optimized to estimate mu1 ( which I guess is the mean IPD value if methylated bases)?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/PacificBiosciences/kineticsTools/issues/81, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AANKTE7I4BD62YGDL6J6HN3SSN2DJANCNFSM4UHMSQSA.
Hi authors, I was interested in how ipdSummary gives out the
frac
value, so I read the source code and I was confused about this function:Why estimateSingleFraction is optimized to estimate mu1 ( which I guess is the mean IPD value if methylated bases)?