Closed corinnebosley closed 4 years ago
The same is true for all of the regridders currently in iris I believe (Linear/Nearest/AreaWeighted). No caching of weights and indices takes place.
i would like to put this together into a use case for some planned enhancement work we have for later this year
further comments and thoughts on this, to keep it in my notifications, are much appreciated
Use case: The AreaWeighted is currently the only conservative scheme.
AreaWeighted is ~1000 time slower than Linear or Nearest.
Is it necessary to have two ways of doing regridding?
If the weight creation is the most expensive step, a regridding function could return the weights explicitly, and feed them into the regridding function the next time. This would not add more complexity to the user than using Regridder objects, but would potentially be much simpler.
Testing the current caching method seems difficult. (Is it caching? Is it using the cache? it there a benefit using the cache?)
@tv3141 the current UI allows for a generic interface to regridding/interpolation and for the implementation details be hidden from the user. This framework is what allows caching to be developed without changing the UI for users.
I think the issue you raise would be better approached by extending the docstrings which should highlight current capabilities and limitations. help(iris.analysis.Linear)
for example is rather lacking in necessary details.
The bottleneck is averaging the source grid points using np.ma.average
. The weight calculation takes <10%.
The area-weighted regridding is done in a large loop over the target grid points. The weights are calculated inside the loop for each target grid point. The regridding is done in one step for all dimensions (time, levels).
https://github.com/SciTools/iris/blob/v1.12.x/lib/iris/experimental/regrid.py#L503
def _regrid_area_weighted_array(src_data, x_dim, y_dim,
src_x_bounds, src_y_bounds,
grid_x_bounds, grid_y_bounds,
grid_x_decreasing, grid_y_decreasing,
area_func, circular=False, mdtol=0):
[...]
# Simple for loop approach.
for j, (y_0, y_1) in enumerate(grid_y_bounds):
[...]
for i, (x_0, x_1) in enumerate(grid_x_bounds):
[...]
data = src_data[tuple(indices)]
# Calculate weights based on areas of cropped bounds.
weights = area_func(y_bounds, x_bounds)
[...]
# Calculate weighted mean taking into account missing data.
new_data_pt = _weighted_mean_with_mdtol(data,
weights=weights, axis=axis, mdtol=mdtol)
# Insert data (and mask) values into new array.
[...]
new_data[tuple(indices)] = new_data_pt
[...]
return new_data
def _weighted_mean_with_mdtol(data, weights, axis=None, mdtol=0):
"""
...
"""
res = ma.average(data, weights=weights, axis=axis)
[...]
return res
np.ma.average
import cProfile
import pstats
import iris
areaweighted_regridder = iris.analysis.AreaWeighted().regridder(create_cube(*resolution['N768']), \
create_cube(*resolution['N96']))
cube = create_cube(*resolution['N768'], levels=3)
cProfile.run('areaweighted_regridder(cube)', 'aw_regrid.prof')
stats = pstats.Stats('aw_regrid.prof')
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats()
Tue May 2 17:05:24 2017 aw_regrid.prof
9573273 function calls (9573143 primitive calls) in 12.114 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 12.114 12.114 <string>:1(<module>)
1 0.000 0.000 12.114 12.114 _area_weighted.py:89(__call__)
1 0.000 0.000 12.114 12.114 regrid.py:627(regrid_area_weighted_rectilinear_src_and_grid)
1 0.487 0.487 12.110 12.110 regrid.py:405(_regrid_area_weighted_array)
27648 0.070 0.000 9.490 0.000 regrid.py:353(_weighted_mean_with_mdtol)
--> 27648 0.270 0.000 9.400 0.000 extras.py:464(average)
138241 1.139 0.000 3.060 0.000 core.py:2865(__array_finalize__)
27648 0.066 0.000 2.861 0.000 core.py:3908(__truediv__)
55296 0.230 0.000 2.842 0.000 core.py:1002(reduce)
248833 1.224 0.000 2.785 0.000 core.py:2839(_update_from)
27648 0.774 0.000 2.769 0.000 core.py:1109(__call__)
...
27648 0.041 0.000 0.801 0.000 regrid.py:324(_spherical_area) # weights
np.ma.average
10x slower than np.average
import numpy as np
print np.__version__
arr = np.random.rand(1,8,8,3)
w = np.random.rand(1,8,8,3)
print 'np.ma.average axis'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.ma.average(arr, weights=w, axis=(1,2))
print
print 'np.average axis'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.average(arr, weights=w, axis=(1,2))
1.12.1
np.ma.average axis
[[0.5558622851924222 0.5015630215812361 0.512327031679091]]
1000 loops, best of 3: 240 µs per loop
np.average axis
[[0.5558622851924222 0.5015630215812361 0.512327031679091]]
10000 loops, best of 3: 27.4 µs per loop
Even without mask np.ma.average
is ~10x slower than the normal array version np.average
. I do not understand why, identical lines of code have large differences in execution time.
import numpy as np
print np.__version__ # 1.12.1
from line_profiler import LineProfiler
arr = np.random.rand(1,8,8,3)
w = np.random.rand(1,8,8,3)
lp = LineProfiler()
lp_wrapper = lp(np.ma.average)
lp_wrapper(arr, weights=w)
lp.print_stats()
lp = LineProfiler()
lp_wrapper = lp(np.average)
lp_wrapper(arr, weights=w)
lp.print_stats()
Timer unit: 1e-06 s
Total time: 0.000392 s
File: ~/localdata/python_test/miniconda/envs/iris_new_np/lib/python2.7/site-packages/numpy/ma/extras.py
Function: average at line 521
Line # Hits Time Per Hit % Time Line Contents
==============================================================
521 def average(a, axis=None, weights=None, returned=False):
572 1 105 105.0 26.8 a = asarray(a)
573 1 4 4.0 1.0 m = getmask(a)
574
575 # inspired by 'average' in numpy/lib/function_base.py
576
577 1 1 1.0 0.3 if weights is None:
578 avg = a.mean(axis)
579 scl = avg.dtype.type(a.count(axis))
580 else:
581 1 6 6.0 1.5 wgt = np.asanyarray(weights)
582
583 1 3 3.0 0.8 if issubclass(a.dtype.type, (np.integer, np.bool_)):
584 result_dtype = np.result_type(a.dtype, wgt.dtype, 'f8')
585 else:
586 1 3 3.0 0.8 result_dtype = np.result_type(a.dtype, wgt.dtype)
587
588 # Sanity checks
589 1 2 2.0 0.5 if a.shape != wgt.shape:
605 1 1 1.0 0.3 if m is not nomask:
606 wgt = wgt*(~a.mask)
607
608 1 35 35.0 8.9 scl = wgt.sum(axis=axis, dtype=result_dtype)
609 1 230 230.0 58.7 avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl
610
611 1 1 1.0 0.3 if returned:
612 if scl.shape != avg.shape:
613 scl = np.broadcast_to(scl, avg.shape).copy()
614 return avg, scl
615 else:
616 1 1 1.0 0.3 return avg
Timer unit: 1e-06 s
Total time: 9.7e-05 s
File: ~/localdata/python_test/miniconda/envs/iris_new_np/lib/python2.7/site-packages/numpy/lib/function_base.py
Function: average at line 1021
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1021 def average(a, axis=None, weights=None, returned=False):
1095 # 3/19/2016 1.12.0:
1096 # replace the next few lines with "a = np.asanyarray(a)"
1097 1 3 3.0 3.1 if (type(a) not in (np.ndarray, np.matrix) and
1106 1 3 3.0 3.1 if not isinstance(a, np.matrix):
1107 1 7 7.0 7.2 a = np.asarray(a)
1108
1109 1 1 1.0 1.0 if weights is None:
1110 avg = a.mean(axis)
1111 scl = avg.dtype.type(a.size/avg.size)
1112 else:
1113 1 4 4.0 4.1 wgt = np.asanyarray(weights)
1114
1115 1 4 4.0 4.1 if issubclass(a.dtype.type, (np.integer, np.bool_)):
1116 result_dtype = np.result_type(a.dtype, wgt.dtype, 'f8')
1117 else:
1118 1 3 3.0 3.1 result_dtype = np.result_type(a.dtype, wgt.dtype)
1119
1120 # Sanity checks
1121 1 2 2.0 2.1 if a.shape != wgt.shape:
1137 1 32 32.0 33.0 scl = wgt.sum(axis=axis, dtype=result_dtype)
1138 1 16 16.0 16.5 if (scl == 0.0).any():
1139 raise ZeroDivisionError(
1140 "Weights sum to zero, can't be normalized")
1141
1142 1 20 20.0 20.6 avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl
1143
1144 1 1 1.0 1.0 if returned:
1145 if scl.shape != avg.shape:
1146 scl = np.broadcast_to(scl, avg.shape).copy()
1147 return avg, scl
1148 else:
1149 1 1 1.0 1.0 return avg
To learn more about np.ma.average
, I copied its code in a Jupyer notebook, and started removing non-essential code. I noticed that running a copy of the function np.ma.average
in the current namespace speeds it up more than 10x. It also produces single precision results instead of double precision.
Any ideas what makes np.ma.average
slow, and what causes the speed-up?
What determines the precision of the calculation?
import inspect
import numpy as np
print np.__version__ # 1.12.1
# imports for copied code
from numpy import asarray
from numpy.ma import getmask, nomask
def ma_average(a, axis=None, weights=None, returned=False):
# copied code from
# print inspect.getsourcefile(np.ma.average)
arr = np.random.rand(1,8,8,3)
w = np.random.rand(1,8,8,3)
print 'np.ma.average'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.ma.average(arr, weights=w, axis=(1,2))
print
print 'ma_average'
print ma_average(arr, weights=w, axis=(1,2))
%timeit a = ma_average(arr, weights=w, axis=(1,2))
print
print 'np.average'
print np.ma.average(arr, weights=w, axis=(1,2))
%timeit a = np.average(arr, weights=w, axis=(1,2))
print
np.ma.average
[[0.5798026614723218 0.45410607639507333 0.4378237178069648]]
1000 loops, best of 3: 238 µs per loop
ma_average
[[ 0.57980266 0.45410608 0.43782372]]
10000 loops, best of 3: 21.9 µs per loop
np.average
[[0.5798026614723218 0.45410607639507333 0.4378237178069648]]
10000 loops, best of 3: 27.4 µs per loop
np.ma.average
calls a = asarray(a)
to convert a
into a masked array. When I copied the code I imported the wrong asarray
function:
from numpy import asarray
when it should have been
from numpy.ma import asarray
This also makes the 'local' ma.average
slow again, so things make sense again.
np.ma.average
is much slower than np.average
when applied to small unmasked arrays, because of the expensive conversion to a masked array np.ma.asarray(a)
, and np.multiply(arr)
being much slower when called with a masked array.
I realised I perhaps neglected to mention that I would also really like to see caching implemented in iris :) Really interesting observations on masked arrays. Feels like quite a significant refactor to make this happen. To make weights really useful, these should also be accessible to the end user (example: users might use them directly to sparse arrays) and secondly, perhaps having these cached to disk or something. A lot to think about I think...
This section has been reviewed in the "area_weighted_regridding" feature branch. It is self-contained and gives the same results as current master: See PR #3598
Update from refinement meeting:
The current status of the work was discussed. We think completing it is achievable with approx:
• ½ day review work associated with re-jigging the current regrid_area_weighted_rectilinear_src_and_grid
function
• 4 days refactoring the tests (1 day for weights function, 1 day for prepare, perform, 1 day regridder, 1 day review comments)
• 1 day review work for the above
• Assistance from the sprint team (e.g. setting up an Iris build, updating test data)
• (This is additional to the ~week of scoping/dev work already undertaken by AVD)
This nicely aligns with the 5 days dev work that Emma/Andy are providing.
Optimisation AC - time to process a single cube is quicker than previously see #2370
The feature branch, upstream/area_weighted_regridding
can be merged.
There are four outstanding items:
1) Some extra comments describing the regrid_info weight variables would be useful
2) The "jit" could be added in the future for further speed up
3) The ASV test needs further investigation as this should be demonstrated a speed up. (Our other testing shows the optimization delivers speed-up)
4) The dtype of the data returned remains unchanged. However, we think the current regridding should return the original data dtype, but it isn't necessarily doing this at present.
Hi @abooton apologies if this isn't useful or I'm too late, but for what it's worth:
...However, we think the current regridding should return the original data dtype...
I'm not 100% sure I have understood the context here but I would have thought it reasonable/flexible that area weighted regridding return data which relates to the source dtype by the following relation:
tgt_dtype = np.promote_types(src_dtype, 'float16')
Hope this helps in some way.
Merry Christmas!
The following quick check indicates we expect the ASV tests should have seen a noticeable improvement:
# commit 738093f1: time: areawieghted_regridder 13.124 s pre-optimisation
# commit 0d1ddb40: time: areawieghted_regridder 1.575 s post-optimisation
from iris import tests
import iris
from time import time
def regridding_test():
file_path = tests.get_data_path(
["NetCDF", "global", "xyt", "SMALL_hires_wind_u_for_ipcc4.nc"]
)
cube = iris.load_cube(file_path)
scheme_area_w = iris.analysis.AreaWeighted()
ts = time()
cube.regrid(cube, scheme_area_w)
tf = time()
print('areawieghted_regridder',round(tf - ts, 3), 's')
regridding_test()
Hi @cpelley,
Thanks for your thoughts.
I wonder what the implications are for memory usage here from these changes. Do you have any thoughts? - I think my gut would be that it would be that it should still be reasonable given the alternative approaches available which certainly give iris a lot more wiggle in this regard :)
We'd not specifically looked at this as the user requirement was for reducing computation time. The regridder object now retains the weights (which may be similar in size to the 2D data array, or might be bigger as they are float64). I would think the memory requried to average the data would be more, particularly for 2D diagnostics, because 1) the weights are pushed into a additional array and 2) the src data is initially stored in an array larger than that of the target grid prior to computing the mean. (Previously the weights and mean were computed independently by looping through every point on the target grid, so I think the memory would have been released between each calculation). It should be noted that the total memory useage of the new code could easily be optimised in a future PR, by initally storing the weights in the numpy array structure required by the regridding "perform" code. This further refactoring hasn't been done yet, as we didn't wish to introduce a bug into the weights calculations.
I'm not 100% sure I have understood the context here but I would have thought it reasonable/flexible that area weighted regridding return data which relates to the source dtype by the following relation: tgt_dtype = np.promote_types(src_dtype, 'float16')
To clarify, the changes here retain the current dtype behaviour. However I was concerned that integer source data can be returned as type float.
src_dtype = np.int32
tgt_dtype = np.promote_types(src_dtype, 'float16')
print(tgt_dtype)
>> float64
...could easily be optimised in a future PR, by initally storing the weights in the numpy array structure required by the regridding "perform" code...
Awesome! Yes this is how we made ESMF regridding approachable (the memory overhead there really is too large). Do feel free to give me a shout for a chat if/when you think it might be useful to do so.
To clarify, the changes here retain the current dtype behaviour...
I don't think AreaWeighted/bilinear regridding should ever return integer types. Performing area conservative regridding implies to me floats in the return since area calculations should be float. The result needs to be always both locally and globally conservative.
I wonder what the implications are for memory usage here from these changes. Do you have any thoughts? - I think my gut would be that it would be that it should still be reasonable given the alternative approaches available which certainly give iris a lot more wiggle in this regard :)
@cpelley, using memory_profiler I generated the following plots by loading a cube (where cube.data.shape = (31, 160, 320)
), then performing regridding 10 times:
using 738093f, profiling regrid_area_weighted_rectilinear_src_and_grid
:
using the feature branch, profiling _regrid_area_weighted_rectilinear_src_and_grid__prepare
and _regrid_area_weighted_rectilinear_src_and_grid__perform
:
Edit: There is an ~15% increase in the average memory usage, but an ~90% decrease in the total time taken to perform the regridding.
Generating data using float16 vs float64 reveals huge differences. That is, in hindsight, I don't think data recorded as integers should be treated as anything but the highest precision (float64). Integer data doesn't mean precision only to an integer. Example here is the land cover type fraction ancillary generation (int8->float64). I think I would make sense to always promote to float64 on this basis as not doing so assumes what the users data represents. Sorry for going back and fourth on this.
I think this is probably addressed by #3617 It's complicated, so need to spend some effort to confirm this is the case ?
The following quick check indicates we expect the ASV tests should have seen a noticeable improvement: commit 738093f1: time: areawieghted_regridder 13.124 s pre-optimisation commit 0d1ddb40: time: areawieghted_regridder 1.575 s post-optimisation
This has been addressed by #3660 - asv now demonstrates a change in performance at the appropriate commit.
closed by #3623
Outstanding (numba) acceptance criteria is "nice to have".
This has been a major issue for several people, here are a few links to provide some context: https://github.com/SciTools/iris/issues/2370 https://exxconfigmgmt:6391/browse/EPM-1542
@abooton - Updating the description 03/12/2019. As a user of the area-weighted regridder I would like to cache the area-weights (as well as snapshot the grid info) so that I can reduce the time taken to regrid multiple cubes.
Description: As described in the iris documentation: https://scitools.org.uk/iris/docs/latest/userguide/interpolation_and_regridding.html?highlight=regridding#caching-a-regridder "If you need to regrid multiple cubes with a common source grid onto a common target grid you can ‘cache’ a regridder to be used for each of these regrids. This can shorten the execution time of your code as the most computationally intensive part of a regrid is setting up the regridder."
Unfortunately the weights are not currently cached, so the benefit described is not realised when carrying out area-weighted regridding. It is noted that although, at present, the majority of time is currently spent calculating the weighted-mean, computing the weights can be significant e.g ~25% for non-masked arrays (not masked for which stats are reported on below).
See #2370 for a good example of setting the regridder up.
Acceptance Criteria:
regrid_area_weighted_rectilinear_src_and_grid
API should be maintained as it isiris.analysis.AreaWeighted().regridder(cube1, cube2)
(see example in #2370)__prepare
and__perform
)numba
is used for code speed-up, it should be implemented on an "if available - use it" basisNote: The weights are currently computed alongside the weighted mean calculation, in the loop. If the weights calculation is refactored, the grid points will be looped over twice instead of once. Therefore, it is suggested that the work is developed in a feature branch, and implemented once code refactoring and optimisation are both complete.