Closed MarionBWeinzierl closed 1 year ago
The same is true for calc_density_h2o
and I need to implement a new method for that for compatibility with the SPLASH benchmarks. So I've just pushed a commit to feature/splash_v1
that adds various alternate approaches to calculating that new method (Chen et al) which has been revealing.
With that commit (https://github.com/ImperialCollegeLondon/pyrealm/pull/69/commits/14faef79b1329333e8ef37866e1e9d243b1a98b8), I've timed the various approaches:
import timeit
import numpy as np
import matplotlib.pyplot as plt
func_alts = {"simple", "chen_allinone", "chen_cumulative", 'chen_matrix'}
sizes = np.array([1, 5, 10, 25, 50, 100])
run_times = {k: np.full(6, fill_value=np.nan) for k in func_alts}
for fn in func_alts:
call = f"density_h2o_{fn}(tc, patm)"
for idx, sz in enumerate(sizes):
setup = (f"from pyrealm.pmodel.functions import density_h2o_{fn}\n"
f"import numpy as np\n"
f"tc = np.random.uniform(0,50, ({sz},{sz}))\n"
f"patm = np.random.uniform(90000, 120000, ({sz},{sz}))\n")
run_times[fn][idx] = timeit.timeit(call, setup=setup, number=200)
for fn in func_alts:
print(fn)
plt.plot(sizes, run_times[fn], label=fn)
plt.legend()
plt.show()
The matrix approach using np.pow
and np.sum
sucks basically - it is by far the slowest and scales the worst with size. Interestingly, the original implementation (density_h2o_cumulative
) using cumulative calculation using +=
of the polynomial terms seems best, even faster than expressing the whole term as one allinone
equation to avoid repeated assignment. Obviously the simple
calculation is the fastest but that may be woefully inaccurate. Annoyingly, the original implementation of calc_viscosity_h20
also used a similar cumulative approach and I swapped it out for the ever so clever matrix calculation. I am both surprised and chastened - anyone know why this the case?
It looks to me like a lot of memory is being allocated and thrown away and reallocated again in the big np commands. I'll try and have a deeper look tomorrow but I think this also explains the memory allocation issue we're seeing with the big data set in the CI. If I cut the dataset down to a smaller value we can get further but we hit that limit again later on in that method. This is perhaps not unexpected but we should see what we can do in this area.
Apologies, this was a quick note to capture some thoughts before I forget them.
Yeah - I haven't even looked at the memory allocation here. Is there a simple timeit
style option for doing that? We already have what looks like a compelling argument for moving back to the simple cumulative approach!
In the short term, @a-smith-github , I'd forgotten that we can already switch to the simple calculation. It's a bit obscure but the const
argument to PModelEnvironment
provides a whole bunch of constants/parameters/settings and we can adjust the test_profiling.py
script:
To
from pyrealm.constants import PModelConst
const = PModelConst(simple_viscosity=True)
pm_env = PModelEnvironment(tc=tc, patm=patm, vpd=vpd, co2=co2, const=const)
All objects created from that pm_env
object should then inherit the setting to use simple viscosity.
OK - I have started a new branch from this issue to update functions in develop
. With commit a717e14, I have implemented separate functions for two methods (chen
and fisher
) using both matrix algebra and straight multiplication to get powers of tc
. Running the comparisons shows that straight multiplication is miles better in both cases, so I will update develop
to add the new chen
method and update the calculations.
I can't find an acceptable simple calculation for density at the moment, so let's see how this update affects the profiling.
The comparison from that branch shows:
import timeit
import numpy as np
import matplotlib.pyplot as plt
func_alts = {"chen", "chen_matrix", 'fisher', "fisher_matrix"}
sizes = np.array([1, 5, 10, 25, 50, 100])
run_times = {k: np.full(6, fill_value=np.nan) for k in func_alts}
for fn in func_alts:
call = f"calc_density_h2o_{fn}(tc, patm)"
for idx, sz in enumerate(sizes):
setup = (f"from pyrealm.pmodel.functions import calc_density_h2o_{fn}\n"
f"import numpy as np\n"
f"tc = np.random.uniform(0,50, ({sz},{sz}))\n"
f"patm = np.random.uniform(90000, 120000, ({sz},{sz}))\n")
run_times[fn][idx] = timeit.timeit(call, setup=setup, number=200)
for fn in func_alts:
plt.plot(sizes, run_times[fn], label=fn)
plt.tight_layout()
plt.legend()
plt.show()
I have just pushed another commit (0283d75e070fc125c7413a41c988834c5de3e860) to the branch that adds a new version of calc_viscosity_h2o
that avoids using matrix multiplication. This also makes a big difference to run time, although not quite as marked as the density example:
import timeit
import numpy as np
import matplotlib.pyplot as plt
func_alts = {"h2o", "h2o_matrix"}
sizes = np.array([1, 5, 10, 25, 50, 100])
run_times = {k: np.full(6, fill_value=np.nan) for k in func_alts}
for fn in func_alts:
call = f"calc_viscosity_{fn}(tc, patm)"
for idx, sz in enumerate(sizes):
setup = (f"from pyrealm.pmodel.functions import calc_viscosity_{fn}\n"
f"import numpy as np\n"
f"tc = np.random.uniform(0,50, ({sz},{sz}))\n"
f"patm = np.random.uniform(90000, 120000, ({sz},{sz}))\n")
run_times[fn][idx] = timeit.timeit(call, setup=setup, number=200)
for fn in func_alts:
plt.plot(sizes, run_times[fn], label=fn)
plt.tight_layout()
plt.legend()
plt.show()
I will again update the branch to remove the slow matrix implementation.
Some memory profiling using the python memory-profiler
I also get
pyrealm/subdaily.py:305: RuntimeWarning: divide by zero encountered in divide pyrealm/subdaily.py:305: RuntimeWarning: invalid value encountered in divide
Thanks - will have to track down what inputs are causing those warnings. What is the code used to invoke the memory profiling? It would be good to see how the matrix
and non-matrix methods compare.
Thanks - will have to track down what inputs are causing those warnings. What is the code used to invoke the memory profiling? It would be good to see how the
matrix
and non-matrix methods compare.
Just import the library, and then put the decorator @profile
before the function. For the plots, use, for example mprof run python3 tests/pmodel/test_profiling.py
(I just used that and made it a script by adding test_profiling_example()
to the bottom), and then mprof plot
.
See also https://pypi.org/project/memory-profiler/, where it is all described. I just could not really get any additional info using --include-children
so far.
I am now running the profiling for the methods calc_viscosity_h2o
, calc_viscosity_h2o_matrix
, calc_density_h2o_fisher
and calc_density_h2o_chen
separately.
Note that I needed to increase the size of tc
and patm
significantly to receive meaningful results. I used
dim=2000
tc = np.random.uniform(0,50,(dim,dim))
patm = np.random.uniform(90000, 120000, (dim,dim))
(When I ran it with dim=10000, it ran for all methods except calc_viscosity_h2o_matrix
, for which it crashed.)
These are the profiles:
calc_viscosity_h2o
calc_viscosity_h2o_matrix
calc_density_h2o_fisher
calc_density_h2o_chen
So the matrix method is both slower and uses considerably more memory?
Significantly more, yes.
Makes sense it's the np.outer on 925 that gets flagged on profiling, great the allocation and deallocation that you can see on htop is basicallly attributable to one line.
Am I right in thinking that rbar is a constant float? I wonder if we could re-formulate this section in some way so we can avoid creating such a large data structure. Basically can we avoid doing an np.outer and get the result of np.sum(mu1, axis=2) some other way (i.e. can we drop a dimension?) so we can do the division on line 926? Imagine the np.outer on L926 might need to stay but might get some speedup. Been a while since I've done any work with matrices...
outer
calls completely with simple sequential sums. I suspect that there is a more efficient formulation - lets see how this new implementation affects the runtime breakdown.
In the profiling ( #81 ) it was found that one bottleneck seems to be the function https://github.com/ImperialCollegeLondon/pyrealm/blob/fcbba0eead6857fd412522ff7299af03c4d31b95/pyrealm/pmodel/functions.py#L671 .
@davidorme explains that there are various, sometimes very simple alternatives for this calculation.
It would be useful to implement the option to switch between different variants. These could be profiled separately, and the user can then be given the option to trade speed against precision.