ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Performance of volume_statistics #1498

Closed sloosvel closed 2 years ago

sloosvel commented 2 years ago

Describe the bug @ESMValGroup/esmvaltool-coreteam do I have time to quickly fix volume statistics for 2.5? The current implementation performs quite poorly and I have been running the function in one year of data with 75 levels for half an hour and it's not done yet. I think that any approach such as da.average(axis=[all axis except time], weights) would be better than the double loop that is implemented right now.

Please attach

schlunma commented 2 years ago

If this is really just this simple change, I think we can include it.

volume_statistics is currently used by the recipes esmvaltool/recipes/recipe_ocean_example.yml and esmvaltool/recipes/recipe_ocean_bgc.yml, so we would need someone with the scientific expertise to cross-check the output of these two recipes after the change.

sloosvel commented 2 years ago

Not quite, I have one test failing. Anyway I don't want to stress over this. But it's something that should be considered looking into for 2.6 as I don't think one year of monthly data should get stuck as it does right now.

valeriupredoi commented 2 years ago

no tests fail in CI, Saskia, make sure you have all up to date. Yes, I aim at daskifying as much as I have time for in the very near future, judging how things stand now in terms of time, it might just be one function :laughing:

valeriupredoi commented 2 years ago

maybe you and me should form the A-Dask-Team :grin:

sloosvel commented 2 years ago

For future references, this seems to run without breaking the tests:

result = da.ma.average(cube.lazy_data(), axis=(1,2,3), weights=grid_volume)

The indices should be written in a better way and scientific results should be double checked, but at least it's a start.

ledm commented 2 years ago

Just a quick comment that while it's only in a couple public recipes in esmvaltool, it is used a lot in private recipes. However, if it can be made lazy and it doesn't break the weighting using fx files (which I don't think is covered in the unit tests), then go for it!

valeriupredoi commented 2 years ago

@ledm cheers! By private recipes you mean AR6 stuff? :beer:

ledm commented 2 years ago

I use this all the time. It's in so many recipes in ERSMValTool_private, ESMValTool_AR6, and on some other unmerged branches of ESMValTool as well.

sloosvel commented 2 years ago

I worked a bit on a new implementation, testing it with prep_timeseries_5 for the two uncommented datasets in recipe_ocean_example.yml but expanding the time range to 20 years. The results after the preprocessing (volume average + multimodel) are pretty much the same:

  thetao = 277.1949, 277.1975, 277.1993, 277.1997, 277.1982, 277.1949, 
    277.191, 277.1882, 277.1878, 277.1892, 277.1913, 277.1935, 277.196, 
    277.1985, 277.2008, 277.2013, 277.1995, 277.1959, 277.1918, 277.1888, 
    277.1886, 277.1904, 277.1924, 277.1946, 277.1972, 277.1998, 277.2018, 
    277.2022, 277.2004, 277.1967, 277.1921, 277.1891, 277.1885, 277.1897, 
    277.1917, 277.1938, 277.1963, 277.1986, 277.2003, 277.2006, 277.1987, 
    277.1952, 277.191, 277.1882, 277.1877, 277.1891, 277.1913, 277.1937, 
    277.1963, 277.1989, 277.2004, 277.2009, 277.1991, 277.1956, 277.1915, 
    277.1887, 277.1884, 277.1897, 277.192, 277.1944, 277.1969, 277.1994, 
    277.2015, 277.2022, 277.2005, 277.1969, 277.1929, 277.1901, 277.1898, 
    277.1912, 277.1932, 277.1956, 277.198, 277.2004, 277.2023, 277.2026, 
    277.2011, 277.1975, 277.1935, 277.1907, 277.1903, 277.1916, 277.1938, 
    277.1961, 277.1988, 277.2013, 277.2031, 277.2033, 277.2018, 277.1982, 
    277.194, 277.1911, 277.1908, 277.1922, 277.1944, 277.1967, 277.1994, 
    277.2018, 277.2038, 277.2041, 277.2026, 277.1991, 277.1947, 277.192, 
    277.1915, 277.193, 277.1953, 277.1977, 277.2003, 277.2028, 277.2048, 
    277.2051, 277.2036, 277.2, 277.1957, 277.1928, 277.1926, 277.194, 
    277.1964, 277.1988, 277.2012, 277.2034, 277.2053, 277.2054, 277.204, 
    277.2007, 277.1968, 277.194, 277.1935, 277.195, 277.1973, 277.1997, 
    277.202, 277.2045, 277.2065, 277.2068, 277.2051, 277.2015, 277.1974, 
    277.1946, 277.1941, 277.1953, 277.1971, 277.1989, 277.2011, 277.2033, 
    277.205, 277.2052, 277.2035, 277.1999, 277.1958, 277.1929, 277.1923, 
    277.1933, 277.1953, 277.1974, 277.1996, 277.2018, 277.2036, 277.2036, 
    277.2019, 277.1984, 277.1943, 277.1915, 277.191, 277.1923, 277.1945, 
    277.1967, 277.1991, 277.2015, 277.2032, 277.2036, 277.2021, 277.1988, 
    277.1948, 277.1922, 277.192, 277.1934, 277.1958, 277.1982, 277.2007, 
    277.2031, 277.2048, 277.2053, 277.2037, 277.2002, 277.196, 277.1934, 
    277.1929, 277.1942, 277.1966, 277.199, 277.2013, 277.2039, 277.2057, 
    277.2062, 277.2045, 277.2009, 277.1968, 277.1942, 277.194, 277.1952, 
    277.1973, 277.1998, 277.2022, 277.2048, 277.2068, 277.2073, 277.2059, 
    277.2025, 277.1982, 277.1954, 277.1949, 277.1964, 277.1986, 277.2009, 
    277.2036, 277.2062, 277.2082, 277.2086, 277.207, 277.2034, 277.1992, 
    277.1966, 277.1962, 277.1977, 277.2, 277.2025, 277.2051, 277.2076, 
    277.2095, 277.21, 277.2084, 277.205, 277.201, 277.1983, 277.1981, 
    277.1995, 277.2017, 277.2041, 277.2068, 277.2092, 277.2111, 277.2115, 
    277.2101, 277.207, 277.2031, 277.2004, 277.2, 277.2014, 277.2036, 277.2057 ;
 thetao = 277.1949, 277.1975, 277.1993, 277.1997, 277.1982, 277.1949, 
    277.191, 277.1882, 277.1878, 277.1892, 277.1913, 277.1935, 277.196, 
    277.1985, 277.2008, 277.2013, 277.1995, 277.1959, 277.1918, 277.1888, 
    277.1886, 277.1904, 277.1924, 277.1946, 277.1972, 277.1998, 277.2018, 
    277.2022, 277.2004, 277.1967, 277.1921, 277.1891, 277.1885, 277.1897, 
    277.1917, 277.1938, 277.1963, 277.1986, 277.2003, 277.2006, 277.1987, 
    277.1952, 277.191, 277.1882, 277.1877, 277.1891, 277.1913, 277.1937, 
    277.1963, 277.1989, 277.2004, 277.2009, 277.1991, 277.1956, 277.1915, 
    277.1887, 277.1884, 277.1897, 277.192, 277.1944, 277.1969, 277.1994, 
    277.2015, 277.2022, 277.2005, 277.1969, 277.1929, 277.1901, 277.1898, 
    277.1912, 277.1932, 277.1956, 277.198, 277.2004, 277.2023, 277.2026, 
    277.2011, 277.1975, 277.1935, 277.1907, 277.1903, 277.1916, 277.1938, 
    277.1961, 277.1988, 277.2013, 277.2031, 277.2033, 277.2018, 277.1982, 
    277.194, 277.1911, 277.1908, 277.1922, 277.1944, 277.1967, 277.1994, 
    277.2018, 277.2038, 277.2041, 277.2026, 277.1991, 277.1947, 277.192, 
    277.1915, 277.193, 277.1953, 277.1977, 277.2003, 277.2028, 277.2048, 
    277.2051, 277.2036, 277.2, 277.1957, 277.1928, 277.1926, 277.194, 
    277.1964, 277.1988, 277.2012, 277.2034, 277.2053, 277.2054, 277.204, 
    277.2007, 277.1968, 277.194, 277.1935, 277.195, 277.1973, 277.1997, 
    277.202, 277.2045, 277.2065, 277.2068, 277.2051, 277.2015, 277.1974, 
    277.1946, 277.1941, 277.1953, 277.1971, 277.1989, 277.2011, 277.2033, 
    277.205, 277.2052, 277.2035, 277.1999, 277.1958, 277.1929, 277.1923, 
    277.1933, 277.1953, 277.1974, 277.1996, 277.2018, 277.2036, 277.2036, 
    277.2019, 277.1984, 277.1943, 277.1915, 277.191, 277.1923, 277.1945, 
    277.1967, 277.1991, 277.2015, 277.2032, 277.2036, 277.2021, 277.1988, 
    277.1948, 277.1922, 277.192, 277.1934, 277.1958, 277.1982, 277.2007, 
    277.2031, 277.2048, 277.2053, 277.2037, 277.2002, 277.196, 277.1934, 
    277.1929, 277.1942, 277.1966, 277.199, 277.2013, 277.2039, 277.2057, 
    277.2062, 277.2045, 277.2009, 277.1968, 277.1942, 277.194, 277.1952, 
    277.1973, 277.1998, 277.2022, 277.2048, 277.2068, 277.2073, 277.2059, 
    277.2025, 277.1982, 277.1954, 277.1949, 277.1964, 277.1986, 277.2009, 
    277.2036, 277.2062, 277.2082, 277.2086, 277.207, 277.2034, 277.1992, 
    277.1966, 277.1962, 277.1977, 277.2, 277.2025, 277.2051, 277.2076, 
    277.2095, 277.21, 277.2084, 277.205, 277.201, 277.1983, 277.1981, 
    277.1995, 277.2017, 277.2041, 277.2068, 277.2092, 277.2111, 277.2115, 
    277.2101, 277.207, 277.2031, 277.2004, 277.2, 277.2014, 277.2036, 277.2057 ;

In terms of performance,  the current implementation takes:

2022-03-17 11:35:39,401 UTC [278285] INFO esmvalcore._main:130 Time for running the recipe was: 0:21:38.031445 2022-03-17 11:35:40,386 UTC [278285] INFO esmvalcore._task:127 Maximum memory used (estimate): 10.9 GB

While the new one takes:

2022-03-17 10:48:01,627 UTC [28700] INFO esmvalcore._main:130 Time for running the recipe was: 0:00:57.292436 2022-03-17 10:48:02,001 UTC [28700] INFO esmvalcore._task:127 Maximum memory used (estimate): 14.4 GB



So even though the memory consumption is slightly larger, the reduction in the execution time is quite an improvement. 
zklaus commented 2 years ago

Impressive speedup, @sloosvel! Is there a PR to have a peek?

sloosvel commented 2 years ago

It's in branch dev_vol_stats but I can open a draft.

ledm commented 2 years ago

If I remember correctly, the reason we didn't do this before was that dask average didn't accept weights before. I'm guessing that it does now, but it may be worth appending the test to ensure that the weights are treated correctly.

sloosvel commented 2 years ago

The new approach uses iris to do the averages, as it will be easier to add new operators (since it seems it's something that is pending to be done). It just gets rid of the loops, that really slow down the code when the number of levels or the number of timesteps increases. But if you have an example for the test it can be added just to be sure.