ImperialCollegeLondon / pyrealm

Development of the pyrealm package, providing an integrated toolbox for modelling plant productivity, growth and demography using Python.
https://pyrealm.readthedocs.io/
MIT License
19 stars 8 forks source link

Investigate using `np.polynomial.Polynomial` #167

Closed davidorme closed 5 months ago

davidorme commented 7 months ago

Over in #69, @tztsai pointed out np.poly_1d as a way to simplify code - and although np.polynomial.Polynomial is now the preferred interface - that looks like it could be both a speed up and a way to simplify the code for many polynomials in the code.

A simple timeit comparison shows a significant speedup, but worth checking how it scales. Interestingly it looks like the overhead in creating a Polynomial instance isn't negligible, but I don't think it's worth creating Polynomial instances in the underlying constants classes.


In [25]: import timeit
In [26]: m1 = timeit.timeit(
    ...:     stmt="""po = 0.99983952 + (6.788260e-5) * tc
    ...: po += -(9.08659e-6) * tc * tc
    ...: po += (1.022130e-7) * tc * tc * tc
    ...: po += -(1.35439e-9) * tc * tc * tc * tc
    ...: po += (1.471150e-11) * tc * tc * tc * tc * tc
    ...: po += -(1.11663e-13) * tc * tc * tc * tc * tc * tc
    ...: po += (5.044070e-16) * tc * tc * tc * tc * tc * tc * tc
    ...: po += -(1.00659e-18) * tc * tc * tc * tc * tc * tc * tc * tc
    ...: """,
    ...:     setup="""import numpy as np
    ...: tc = np.array([10,20,30])
    ...: """,
    ...: )
In [27]: m2 = timeit.timeit(
    ...:     stmt="""pn = np.polynomial.Polynomial(cf)
    ...: po = pn(tc)
    ...: """,
    ...:     setup="""import numpy as np
    ...: tc = np.array([10,20,30])
    ...: cf = np.array([0.99983952 , 6.788260e-5 , -9.08659e-6 , 1.022130e-7 , -1.35439e-9 ,
    ...:                1.471150e-11, -1.11663e-13, 5.044070e-16, -1.00659e-18])
    ...: """,
    ...: )
In [28]: m3 = timeit.timeit(
    ...:     stmt="po = pn(tc)",
    ...:     setup="""import numpy as np
    ...: tc = np.array([10,20,30])
    ...: cf = np.array([0.99983952 , 6.788260e-5 , -9.08659e-6 , 1.022130e-7 , -1.35439e-9 ,
    ...:                1.471150e-11, -1.11663e-13, 5.044070e-16, -1.00659e-18])
    ...: pn = np.polynomial.Polynomial(cf)
    ...: """,
    ...: )
In [29]: print(m1, m2, m3)
40.16773283301154 29.03079683400574 23.08425579199684

_Originally posted by @davidorme in https://github.com/ImperialCollegeLondon/pyrealm/pull/69#discussion_r1450519580_

davidorme commented 7 months ago

Looking into this, it looks like manually coding polynomials in Horner form may be the fastest, although using numpy.polynomial.polynomial.polyval offers an decent speed boost in a neater wrapper.

import functools
import timeit
from typing import Callable

import numpy as np
from numpy.typing import NDArray

tc = np.arange(0, 100)
cf = np.array(
    [
        0.99983952,
        6.788260e-5,
        -9.08659e-6,
        1.022130e-7,
        -1.35439e-9,
        1.471150e-11,
        -1.11663e-13,
        5.044070e-16,
        -1.00659e-18,
    ]
)

pn = np.polynomial.Polynomial(cf)

def func1(tc: NDArray, cf: NDArray) -> NDArray:
    """Current form."""
    po = cf[0] + cf[1] * tc
    po += cf[2] * tc * tc
    po += cf[3] * tc * tc * tc
    po += cf[4] * tc * tc * tc * tc
    po += cf[5] * tc * tc * tc * tc * tc
    po += cf[6] * tc * tc * tc * tc * tc * tc
    po += cf[7] * tc * tc * tc * tc * tc * tc * tc
    po += cf[8] * tc * tc * tc * tc * tc * tc * tc * tc

    return po

def func2(tc: NDArray, cf: NDArray) -> NDArray:
    """Manual Horner form.

    https://stackoverflow.com/questions/24065904
    """

    # fmt: off
    po = cf[0] + tc * (
        cf[1] + tc * (
            cf[2] + tc * (
                cf[3] + tc * (
                    cf[4] + tc * (
                        cf[5] + tc * (
                            cf[6] + tc * (
                                cf[7] + tc * cf[8])))))))
    # fmt: on

    return po

def func3(tc: NDArray, cf: NDArray) -> NDArray:
    """Numpy Polynomial callable."""
    pn = np.polynomial.Polynomial(cf)
    po = pn(tc)

    return po

def func4(tc: NDArray, pn: Callable) -> NDArray:
    """Using predefined numpy Polynomial."""
    po = pn(tc)

    return po

def func5(tc: NDArray, cf: NDArray) -> NDArray:
    """Using numpy polyval, which uses Horner form."""
    po = np.polynomial.polynomial.polyval(tc, cf)

    return po

assert np.allclose(func1(tc, cf), func2(tc, cf))
assert np.allclose(func1(tc, cf), func3(tc, cf))
assert np.allclose(func1(tc, cf), func4(tc, pn))
assert np.allclose(func1(tc, cf), func5(tc, cf))

n = 100000
timer_1 = timeit.Timer(functools.partial(func1, tc, cf))
t1 = timer_1.timeit(n)
timer_2 = timeit.Timer(functools.partial(func2, tc, cf))
t2 = timer_2.timeit(n)
timer_3 = timeit.Timer(functools.partial(func3, tc, cf))
t3 = timer_3.timeit(n)
timer_4 = timeit.Timer(functools.partial(func4, tc, pn))
t4 = timer_4.timeit(n)
timer_5 = timeit.Timer(functools.partial(func5, tc, cf))
t5 = timer_5.timeit(n)

print(np.array([t1, t2, t3, t4, t5]) / t1)

This prints:

[1.         0.43208084 0.67382435 0.53665005 0.53343075]

I haven't looked in depth at scaling with size of the inputs, but it also looks like polyval copes better with multidimensional inputs. Given tc = np.arange(0, 100, 0.1).reshape((10,10,10)), the relative timings are:

[1.         0.46194848 0.84781528 0.71898237 0.59637739]
tztsai commented 7 months ago

Looking into this, it looks like manually coding polynomials in Horner form may be the fastest, although using numpy.polynomial.polynomial.polyval offers an decent speed boost in a neater wrapper. ...

Maybe we can implement an utility function

def horner_polyval(x, cf):
  y = 0
  for c in reversed(cf):
    y = x * y + c
  return y
davidorme commented 7 months ago

Could do! How does the speed compare to the current and hard-coded versions? I guess it will slow it down, but if that is negligible, a utility function makes life much easier.

tztsai commented 7 months ago

The time cost of the testcase using horner_polyval is appended at the end: [1. 0.48477401 0.77228972 0.61615877 0.60483256 0.4863881 ] So it appears that the utility function adds a negligible overhead.