choderalab / bayesian-itc

Python tools for the analysis and modeling of isothermal titration calorimetry (ITC) experiments.
GNU General Public License v3.0
5 stars 10 forks source link

Making ITC code faster #10

Closed pgrinaway closed 9 years ago

pgrinaway commented 9 years ago

Hey,

I was messing around with the code and discovered that stripping the units from the expected_injection_heats improves performance by something like two orders of magnitude. In other words, calling the function below (with units stripped in a wrapper):

    @staticmethod
    def _calculate_expected_heats(P0, V0, Ls, DeltaG, DeltaH, DeltaH_0, DeltaVn, beta, N):

        Kd = exp(beta * DeltaG)

        Pn  = numpy.zeros([N])
        Ln  = numpy.zeros([N])
        PLn = numpy.zeros([N])

        dcum = 1.0
        for n in range(N):
            d = 1.0 - (DeltaVn[n] / V0) # dilution factor for this injection (dimensionless)
            dcum *= d # cumulative dilution factor
            P = V0 * P0 * dcum # total quantity of protein in sample cell after n injections (mol)
            L = V0 * Ls * (1. -dcum)  # total quantity of ligand in sample cell after n injections (mol)
            PLn[n] = (0.5/V0 * ((P + L + Kd*V0) - ((P + L + Kd*V0)**2 - 4*P*L)**0.5)) # complex concentration (M)
            Pn[n] = P/V0 - PLn[n]  # free protein concentration in sample cell after n injections (M)
            Ln[n] = L/V0 - PLn[n]  # free ligand concentration in sample cell after n injections (M)
        q_n = numpy.zeros([N]) # q_n_model[n] is the expected heat from injection n
        q_n[0] = (DeltaH) * V0 * PLn[0] + DeltaH_0  # first injection # review removed a factor 1000 here, presumably leftover from a unit conversion
        for n in range(1,N):
            d = 1.0 - (DeltaVn[n] / V0)  # dilution factor (dimensionless)
            q_n[n] = DeltaH * V0 * (PLn[n] - d*PLn[n-1]) + DeltaH_0 # subsequent injections # review removed a factor 1000 here, presumably leftover from a unit conversion
        return q_n
jchodera commented 9 years ago

Unit handling is slow with simtk.unit as well, but two orders of magnitude is surprising.

This suggests we should perhaps choose a standard consistent set of internal units and construct the pymc model in these units, wrapping the results back in units when done.

bas-rustenburg commented 9 years ago

ureg.wraps decorator can handle that automatically. I was planning to use that, see #8, since there is a chance of strange bugs with dimensionless units. If that gives us the speedup, then I will definitly make this higher priority.

pgrinaway commented 9 years ago

It's surprising indeed! I should note that I did another thing here--I removed all accesses to self (and other non-numpy objects in general) in the hot loops--I don't know if this should really make a difference, but it might.

bas-rustenburg commented 9 years ago

+1 for the suggestions. This would really help, even just debugging.

bas-rustenburg commented 9 years ago

Removing accesses to self did not speed up the code. (It was actually a little slower). Going to wrap the units around the function and see if it speeds things up.

bas-rustenburg commented 9 years ago

Got it. At least one order of magnitude speedup achieved. Here is the wrapper that handles units.

bas-rustenburg commented 9 years ago

From 500 steps in 45 seconds to 22000. Closing.

jchodera commented 9 years ago

Nice!