choderalab / assaytools

Modeling and Bayesian analysis of fluorescence and absorbance assays.
http://assaytools.readthedocs.org
GNU Lesser General Public License v2.1
18 stars 11 forks source link

Bayesian Model for competition assay #127

Closed sonyahanson closed 5 years ago

sonyahanson commented 6 years ago

First added simple competition model to bindingmodels.py

sonyahanson commented 6 years ago

Added example notebook for new competition bayesian model also compares to very slow general binding model.

jchodera commented 6 years ago

I looked through 4-Analytic-v-General-Bayesian-Model.ipynb but I have no idea what it is showing.

sonyahanson commented 6 years ago

It basically shows that the AnalyticalBindingModel (link below) works, and is better and faster than the general binding model. https://github.com/choderalab/assaytools/blob/Competition_Bayes/AssayTools/bindingmodels.py#L123-L208

Let me know if you want to go through it in person.

sonyahanson commented 6 years ago

Going to add a function in our analytical binding model to prevent this error from happening (in ipynb):

/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/bindingmodels.py:195: RuntimeWarning: invalid value encountered in arccos
  theta = np.arccos((-2*(a**3)+9*a*b-27*c)/(2*np.sqrt((a**2-3*b)**3)))
sonyahanson commented 6 years ago

This seems to have worked just fine: no more error and the DeltaG trace is a bit less crazy: before-v-after (Orange is the trace of the DeltaG of our competitor.)

Note that there are still other errors popping up in the notebook:

/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:240: RuntimeWarning: overflow encountered in exp
  scaling[large_indices] = (1 - np.exp(-alpha[large_indices])) / alpha[large_indices]
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/ipykernel_launcher.py:178: RuntimeWarning: overflow encountered in exp
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:95: RuntimeWarning: invalid value encountered in true_divide
  mu = np.log(mean**2 / np.sqrt(stddev**2 + mean**2))
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:72: RuntimeWarning: invalid value encountered in multiply
  tau = 0*mean
/Users/hansons/anaconda2/envs/python3/lib/python3.6/site-packages/assaytools-0.3.1-py3.6.egg/assaytools/pymcmodels.py:86: RuntimeWarning: divide by zero encountered in reciprocal
  tau[small_indices] = (x[small_indices] - x[small_indices]**2/2 + x[small_indices]**3/3 - x[small_indices]**4/4 + x[small_indices]**5/5)**(-1)

does this function need to be vectorised?

I used map to do what I think you are alluding to here, perhaps vectorizing would be better...

sonyahanson commented 5 years ago

So right now this PR includes the addition of the Analytic Competition Binding model to bindingmodels.py, to be complete it should probably also include the appropriate changes to the make_model function in pymcmodels.py.

For now I've just been making my own simplified version of this make_model function in ipynb's for testing.

One reason I have yet to incorporate this into the real pymcmodels.py is that I'm still seeing the errors mentioned in the comment above, and am not sure if they are related to numerical instabilities and can be easily corrected.

Furthermore I have been playing a bit with simulated data, and the fits seem to get stuck in a way that the real experimental data isn't exhibiting (see stuckness below). The latest commit in this PR, is an ipynb example of this attempt at analyzing simulated data. Perhaps this is also related to some numerical stability that can be easily fixed?

deltag_stuck

jchodera commented 5 years ago

Furthermore I have been playing a bit with simulated data, and the fits seem to get stuck in a way that the real experimental data isn't exhibiting (see stuckness below). The latest commit in this PR, is an ipynb example of this attempt at analyzing simulated data. Perhaps this is also related to some numerical stability that can be easily fixed?

I think this is the issue we talked about on Wednesday. Does your simulated data have experimental measurement noise in it? Or is it still getting stuck even with that noise?

sonyahanson commented 5 years ago

It has noise, but it is possible it is not in the right form...

sonyahanson commented 5 years ago

@jchodera did you ever get a chance to look at the numerical instabilities in this model and see if there are any easy fixes?

jchodera commented 5 years ago

Not yet, but I hope to dive in on Tuesday!

sonyahanson commented 5 years ago

Note, setting F_PL does not seem to prevent bistability. This is the delG trace when using a prior on F_PL: ima_delg_trace-fpl

sonyahanson commented 5 years ago

So I chatted with @chaya about some potential numerical stabilities in the analytical binding model, and these were the candidates we came up with:

@jchodera do you have any opinions about either of these?

steven-albanese commented 5 years ago

np.exp and np.sqrt would be unstable for any very big or small numbers, I'm not that suspicious of anything involving np.exp, but to check np.sqrtwe should look at what values we are usually getting for a and b

For mathematical functions in pymc3, they recommend using theano's built in functions. Is there something equivalent for pymc, or is that numpy?

sonyahanson commented 5 years ago

@jchodera did you get a chance to take a look at these numerical instabilities? I'll probably take a stab this week if you have any quick suggestions.

sonyahanson commented 5 years ago

Since it has gotten burried in other comments, here's a rehash of errors that I think are related to the numerical stability of this competition binding model. These are the ones I have come across in a trial run today to try and get some more insight into this, e.g. by looking at other traces besides just delG.

pymcmodels.py:240: RuntimeWarning: overflow encountered in exp
  scaling[large_indices] = (1 - np.exp(-alpha[large_indices])) / alpha[large_indices]

ipykernel_launcher.py:126: RuntimeWarning: overflow encountered in exp

pymcmodels.py:72: RuntimeWarning: invalid value encountered in multiply
  tau = 0*mean

pymcmodels.py:86: RuntimeWarning: divide by zero encountered in reciprocal
  tau[small_indices] = (x[small_indices] - x[small_indices]**2/2 + x[small_indices]**3/3 - x[small_indices]**4/4 + x[small_indices]**5/5)**(-1)

pymcmodels.py:95: RuntimeWarning: invalid value encountered in true_divide
  mu = np.log(mean**2 / np.sqrt(stddev**2 + mean**2))
jchodera commented 5 years ago

It appears that, somehow, these quantities are going negative:

    ELC_ex = epsilon_ex*path_length*concentration
    ELC_em = epsilon_em*path_length*concentration

I'm not sure how this can happen, since all of these quantities must be positive.

jchodera commented 5 years ago

Specifically, it looks like the concentrations are going negative in the notebook!

jchodera commented 5 years ago

They're not just a little negative either:

AssertionError: concentration should not be negative, but are: [ 3.29791690e-13 -1.18127773e+02  1.08564837e-07  7.92493225e-14
  9.20966796e-03 -1.82372674e+01  5.82184201e-10  3.67619973e+01
  1.51951215e-09  2.74082464e-03  2.72987858e-11  0.00000000e+00]

That's -118 Molar!

jchodera commented 5 years ago

@sonyahanson : OK, the problem is your binding model:

    221         # Check all concentrations are nonnegative
--> 222         assert np.all(P >= 0)
    223         assert np.all(L >= 0)
    224         assert np.all(PL >= 0)

I've added some assertions to the model that check to make sure the concentrations are all nonnegative, and this is failing. Since you derived this model, can you fix it?

This is the problematic function.

I'd suggest switching to the GeneralBindingModel I wrote.

jchodera commented 5 years ago

Based on our chat today:

@jchodera will

@sonyahanson will

jchodera commented 5 years ago

@sonyahanson: I think I've sorted out the issues we discussed today!

I've committed some changes that make the standard TwoComponentBindingModel a more robust to negative concentrations.

I've also discovered the issue with the warnings from your use of GeneralBindingModel! The issue was that it only accepts log concentrations, but you were feeding it log(0) since some wells had zero ligand concentration in them. For now, I've set these to be the log of 1/1000 the smallest nonzero concentration, but I think I can modify the API for the general binding model to accept and return concentrations instead of log concentrations if it's inconvenient to be careful to feed it valid logarithms.

I've checked in an updated Jupyter notebook with my changes, but note that I haven't fixed your CompetitiveBindingModel, so that part of the notebook still fails.

I've created an issue for me to implement a simpler API for you: https://github.com/choderalab/assaytools/issues/141

The GeneralBindingModel is still super slow. I'll have to see how we can speed that up.

sonyahanson commented 5 years ago

Hmmm. Travis is still failing, but for a different reason, now...

jchodera commented 5 years ago

Looks like a few things are going on here:

jchodera commented 5 years ago

@sonyahanson : I think I've fixed the bug that prevented travis tests from running through. Now I'm seeing some conda HTTP errors, so I've retriggered the builds that failed because of that---the others pass.

I understand we may be seeing lots of HTTP errors in the future until we migrate to conda-forge, so we may want to think of migrating assaytools to conda-forge and using only conda-forge dependencies as soon as things are a bit more stable.

jchodera commented 5 years ago

Tests pass now @sonyahanson

jchodera commented 5 years ago

@sonyahanson : I think we should be able to merge your PR!

sonyahanson commented 5 years ago

Sure, will do. I just climbed out of a grant resubmission whole so still want to check on if there is an obvious answer for why you are seeing negative concentrations with the analytical binding model, but I can do that in a new PR.