CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
18 stars 3 forks source link

Investigate Python JIT tools and interpreters #37

Open 1fish2 opened 6 years ago

1fish2 commented 6 years ago

Here are the most interesting of the faster-Python implementations. Caveat emptor. Obviously our requirements include a robust implementation with NumPy and Linux compatibility.

Also see:

1fish2 commented 6 years ago

Another candidate tool:

1fish2 commented 5 years ago

pypy might actually support NumPy now. The NumPy 1.15 release notes say "Improved support for PyPy." Ditto for the 1.14 release notes.

If it's beyond the "work in progress" stage, then it might be worth trying. Also the Intel Distribution for Python, #36

jmason42 commented 5 years ago

Good news: Numba is very effective when used in Gillespie-like stochastic simulations:


from __future__ import absolute_import, division, print_function

import timeit

import numpy as np

from numba import jit

def gillespie():
    t = 0

    x1 = 1000
    x2 = 0

    # 1, 2: equilibrium
    c1 = 10
    c2 = 10

    # 3: loss
    c3 = 0.1

    t_hist = [0]
    x_hist = [(x1, x2)]

    for iteration in xrange(100000): # both range and xrange work
        a1 = c1 * x1
        a2 = c2 * x2
        a3 = c3 * x2

        a = np.array([a1, a2, a3])

        at = a.sum()

        if at == 0:
            break

        dt = np.random.exponential(1/at)

        t += dt

        # choice = np.where(np.random.multinomial(1, a / at))[0][0]
        # choice = np.where(np.cumsum(a) >= at * np.random.uniform(0, 1))[0][0]

        # This gives the best performance - about 50% better than the other
        # two, which is consistent with theoretical expectations
        r = at * np.random.uniform(0, 1)
        for choice, acs in enumerate(np.cumsum(a)):
            if acs >= r:
                break

        else:
            raise Exception("A reaction couldn't be selected - this should not happen.")

        if choice == 0:
            x1 -= 1
            x2 += 1

        elif choice == 1:
            x1 += 1
            x2 -= 1

        elif choice == 2:
            x2 -= 1

        else:
            raise Exception("A nonexistant reaction was selected - this should not happen.")

        t_hist.append(t)
        x_hist.append((x1, x2))

    return np.array(t_hist), np.array(x_hist)

jit_gillespie = jit(
    gillespie,
    nopython = True, # little to no difference, but good for error checking?
    # parallel = True, # no improvement, possibly worse
    # fastmath = True # little to no improvement
    )

NUMBER = 1

print(timeit.repeat(gillespie, number = NUMBER))
print(timeit.repeat(jit_gillespie, number = NUMBER))

'''
Output:
[0.9029200077056885, 0.8931000232696533, 0.8873131275177002]
[0.4484701156616211, 0.02795100212097168, 0.02762603759765625]
'''

That's roughly a 30x speed-up for subsequent calls, and even the initial call is ~2x as fast as the uncompiled call. Unsure how this compares with Cython.

Caveat: I'm told that you can call numpy.random.seed within the JIT'd code to seed the random number generation, but seeding outside doesn't work, and neither does using a numpy.random.RandomState object.

Note: The numba.jit function can be used like a decorator.

jmason42 commented 5 years ago

Little update on seeding:


def gillespie(seed = None):
    if seed is not None:
        np.random.seed(seed)

    else:
        # Without this line, Numba gets angry and fails.  I don't exactly know
        # why, other than that it's having a hard time inferring types, which
        # providing an explicit signature *should* help (but does not, in fact,
        # appear to make a difference).
        seed = 0
    ...

I think this is a Numba bug, or at least, an unhandled use case (although it seems like an obvious use case to me). I verified that you can't seed outside the function, nor can you pass a numpy.random.RandomState instance.

jmason42 commented 5 years ago

Further updates:

The code I posted above isn't necessary if you tell Numba about the types of locals:


def gillespie(seed = None):
    if seed is not None:
        np.random.seed(seed)

    ...

jit_gillespie = nb.jit(
    gillespie,
    locals = dict(
        # This conveys that 'seed' is either None or an
        # integer (specifically, an unsigned 32-bit integer)
        seed = nb.optional(nb.uint32)
        ),
    nopython = True
    )

Also, I tried pre-generating the random numbers outside of the loop (as we now do in the complexation code), and found no perceivable improvement in evaluation time.