Closed valeriupredoi closed 3 years ago
First, let's try to stay factual and polite. SHOUTING is not generally considered that.
On the points:
I think it is (almost) meaningless to look at runtime and, for that matter, memory before the two approaches are successfully doing the same thing.
For the different heights, I like the new approach more than the old one. Silently ignoring a coordinate that is different doesn't seem correct to me. Perhaps a tas
fix for Access with a warning output could be a good solution? The CMIP5 data request unfortunately is a bit soft with
normally, the temperature should be reported at the 2 meter height
The last point is interesting. Why do the cubes end up with different lengths? This smells like a bug and should be treated as such.
@zklaus: The different length of the time axis seems to be connected to leap years. I just wrote in the other issue: The reason for the
WARNING [32819] /work/bd0854/b380216/anaconda3/envs/esmvaltool202106/lib/python3.9/site-packages/iris/coords.py:1979: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'time'.
warnings.warn(msg.format(self.name()))
Warning and the
IndexError: index 9490 is out of bounds for axis 0 with size 9490
Error is most probably, that there are is a different number of days in the 26 years the recipe tries to look at, either 9490 days or 9497 days (probably depending on if the calendar contains leap years or not). I'll try if "regrid_time" would help.
(esmvaltool202006) b380216@mistralpp3% grep -in Running.extract_region.air_temperature.*time: /work/bd1083/b380216/output/recipe_pv_capacity_factor_20210629_095915/run/main_log_debug.txt
6449:2021-06-29 10:00:43,142 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
6731:2021-06-29 10:01:01,890 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
7020:2021-06-29 10:01:06,680 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
7593:2021-06-29 10:01:32,682 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
8138:2021-06-29 10:02:06,893 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
8733:2021-06-29 10:02:30,240 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
9052:2021-06-29 10:02:45,243 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
9736:2021-06-29 10:03:08,329 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
10297:2021-06-29 10:03:46,899 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
11117:2021-06-29 10:04:26,199 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
11743:2021-06-29 10:04:47,619 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
12691:2021-06-29 10:05:41,717 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
13764:2021-06-29 10:06:30,602 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
yes @katjaweigel that's exactly that, well spotted! About @zklaus point about run times, I agree, in principle, if it was a run time difference of tens of percent, but a factor 4 or 5 is unacceptable. I am aware of this happening in certain situations but I've done quite a few tests prior to this being merged and noticed it consistently, albeit those tests were done with significantly lower amounts of data. This is a solid case with plenty of data, and unfortunately this runtime difference is a showstopper in my books, especially that the gain from less required memory is relatively marginal :+1:
For the different heights, I like the new approach more than the old one. Silently ignoring a coordinate that is different doesn't seem correct to me. Perhaps a tas fix for Access with a warning output could be a good solution? The CMIP5 data request unfortunately is a bit soft with
agreed! but we need a checker inside multimodel to either ignore or fix that AuxCoord (or others that might differ). Failing on a coordinate that is not used in the statistic itself is not the desired behavior
On the runtime: @valeriupredoi, your opinion is noted and will of course be taken into consideration.
On the AuxCoord: This coordinate is used. What is the correct height of the resulting data? 2.0m? 1.5m? The weighted average? I don't see that the multimode preprocessor is in a good position to make that call. Since the overwhelming majority of models does use 2m and this is the suggested height of the standards, the best approach to me seems to add a fix to ACCESS-1.3 to change the height coordinate in the tas*
variable(s). One might consider a correction to the data itself, but I leave this to the atmospheric experts.
On the leap day issue: How did the old preprocessor handle this? The only clear solution seems to be some form of time regridding, but I think that should be done by the user explicitly, thus raising an error (though possibly with a better message), is the right thing, imho.
Currently, the preprocessor setting:
regrid_time: frequency: day
Doesn't solve the leap year issue, its fails with:
IndexError: list index out of range 2021-06-29 12:27:34,816 UTC [8169] INFO esmvalcore._main:444
and there are still cubes with 9490 and 9497 days.
not that much more efficient for memory usage.
Note that work on the performance is still in progress in #968, no performance improvement should be expected from the current implementation. The improvement that is included in main
branch at the moment is that it uses iris instead of numpy for computing the statistics. As expected, this will lead to more strict handling of metadata. We will probably need some additional fixes for that, e.g. I proposed #1177 to address some of these issues.
the old preprocessor would find the overlap via np.intersect1d
and would chop the data around it (if overlap
was the setting), I think this new one does it in a similar manner but looks at the start/end points without checking that the points inside the overlap interval are the same (or time points arrays have equal lengths)
If we're talking issues with the new mmm handling that weren't there before the refactoring, I also found one testing recipe_perfmetrics_land_CMIP5.yml as mentioned in this comment: https://github.com/ESMValGroup/ESMValTool/issues/2198#issuecomment-870450689 After regridding not all models had bounds for the longitude coordinate, making iris fail the cube merge. If I add a guess_bounds in there, the problem is solved. Seems to me that for now with all these issues arising, it would be better to revert the change, until all mmm recipes are tested and potential issues compiled and looked at. Now, this might highlight some things we need to change in other areas of the core - but considering the release timing a revert is the most plausible thing.
not that much more efficient for memory usage.
Note that work on the performance is still in progress in #968, no performance improvement should be expected from the current implementation. The improvement that is included in
main
branch at the moment is that it uses iris instead of numpy for computing the statistics. As expected, this will lead to more strict handling of metadata. We will probably need some additional fixes for that, e.g. I proposed #1177 to address some of these issues.
right on, but how can you explain the slowness?
If we're talking issues with the new mmm handling that weren't there before the refactoring, I also found one testing recipe_perfmetrics_land_CMIP5.yml as mentioned in this comment: ESMValGroup/ESMValTool#2198 (comment) After regridding not all models had bounds for the longitude coordinate, making iris fail the cube merge. If I add a guess_bounds in there, the problem is solved. Seems to me that for now with all these issues arising, it would be better to revert the change, until all mmm recipes are tested and potential issues compiled and looked at. Now, this might highlight some things we need to change in other areas of the core - but considering the release timing a revert is the most plausible thing.
good call @bettina-gier :+1:
Note that all the problems we have discussed so far (including the one brought up by @bettina-gier) are pre-existing bugs that have passed us unnoticed until now. Reverting would therefore not solve any issue but just cover up wrong results. I am at the moment not convinced that is the right way to go about it.
Let's try to focus our energy now on understanding the problems and move to a solution strategy after that.
Klaus, those are not masked problems but rather things we don't really care for enough to fix them in the old module and we don't care enough to cause failures in the new module. We have a broken MM module on several aspects and the time we need to address those aspects is much longer than the time to revert and have a working module. We care about functionality foremost and then about improving performance and fixing knickknacks - not that the performance of the new MM is better, as Bouwe says, it's still work in progress
V, I disagree with that assessment and don't think you can speak for all of us here.
As a point in case, trying to advance the issue of the uneven time axis, MOHC's HadGEM2-ES uses a 360 day calendar. Over the 25 year period of this recipe, that's a difference of 255+25/4=131 days or over 4 months of data that get chopped off the end of all* models when you add HadGEM to the list of datasets. Worse, since in all calendars the time is kept as days since a reference point, the same time point (as integer number of days since the reference) refers to different days of the year, towards the end of the period in completely different seasons that all get mixed up here. Whether Summer days get mixed into Winter or Winter days into Summer depends on the position of HadGEM in the list of datasets.
I think scientists care about this sort of thing.
I don't speak for the others - I speak for the core functionality of a key module in ESMValTool which is currently busted and it needs serious repairs to get both its functionality and performance to standards. Interim solution is to replace it with the old module which is known to work (and has been tested against reference results) while we perform the needed fixes and performance optimizations, it's as simple as that
I gave a concrete example of where the old module seems to lead to very wrong results. Do you think that example is wrong? Do you think people are aware of this?
seems is the key word, the module has been tested thoroughly against reference results, unlike the new one. Have a go using the revert branch when you have some time and if that's a bug indeed, then we list it here and we weigh in the arguments for or against reverting
OK @zklaus @katjaweigel @bettina-gier I have got the revert done in https://github.com/ESMValGroup/ESMValCore/pull/1202 (just need to fix the docs build but that's be done if we decide to revert) - if you guys want to test/or just run your recipes to actually get results, go for it! I think we need to talk about this asap, lucky we have the ESMValTool call on Thu, @bouweandela sign me up for this pls :grin:
Sorry to hear that you're experiencing issues with the updated multi-model code (but good to see some real use case testing coming in!) My intention has always been for the code to be more transparent, so that it will be easier to understand, debug, and optimise. While I understand it's annoying that this recipe doesn't work, let's not rush into reverting the update.
I agree with @zklaus that we should prefer solving these bugs that are surfacing right now properly, rather than hiding them away. In the old branch, there were no tests, notes, docstrings, etc. that would inform fellows about these "intentional workarounds". In updating the mmstats code over the past year or so, we stumbled upon many of these implicit assumptions, e.g. also in #685, and these things keep surfacing all the time, which is just as annoying, just not all at once.
With respect to the specific issues mentioned here:
join
argument that controls the behaviour. See http://xarray.pydata.org/en/stable/generated/xarray.concat.html. E.g. it could fill the cube with NaNs if a specific dataset doesn't have a certain coordinate value.@Peter9192 thanks for the summary and the links to the related issues! So far, I did not find any recipe that uses mip = day and multi-model statistics (recipe_pv_capacity_factor.yml is not merged yet and the multi-model mean was removed from the PR) but one never knows what happens in non merged recipes. I think the leap year issue could be solved through the preprocessor setting regrid_time? But currently that doesn't work (at least not with: regrid_time: frequency: day) see above. About all this issues I just wonder:
Why this was not marked as possible breaking changes to the development team (as for example this PR: https://github.com/ESMValGroup/ESMValCore/pull/1099)?
I think the leap year issue could be solved through the preprocessor setting regrid_time?
I think so too. Also relevant: https://github.com/ESMValGroup/ESMValCore/issues/744 (and see #781 for an overview of many related developments)
* Why this was not marked as possible breaking changes to the development team
Not sure what you mean by 'marking' here. I see no label or so on the PR you linked. I think we've made a lot of progress in improving the development process, but this could've been smoother, I agree.
* Why not test all recipes with multi-model mean before the core release, if there is such a large change?
I think there's several reasons for that. For one, there's no easy way to do that. That's why we've spent a lot of time in improving the tests and the test infrastructure (e.g. the testbot, the regression tests with sample data, and #1023). Bouwe is currently trying to set up infrastructure to run all recipes on Mistral, but this is still challenging. And for previous releases, we actually did a lot of manual tests with recipes, so I would have expected issues to surface during this release as well. But of course in this case it depends on the effort everyone puts into testing this with their frequently used recipes.
@Peter9192 Thanks for your answer! With marking I meant just to write "@ESMValGroup/esmvaltool-developmentteam" somewhere in the discussion, as for the issue I mentioned above. Then everybody in the development team gets an email to be aware of the discussion. I think there was the idea communicated during the last ESMValTool workshop that this should be done for potential "breaking changes", i.e. changes in the core which could lead to recipes, that don't run any more.
cheers @Peter9192 :beer: Here's how I see it:
If we decide to use the older v2.2.0 MM module we should have a quick and dirty v2.3.1 out asap, followed by the bugfix release 2.3.2 when the new MM (and fx stuff) have been merged
I like the idea of going back to the old version and waiting until the "shakedown" is finished. A bugfix release once we are happy with the new version sounds like the way to go.
Would it be an idea to have both functions side by side? i.e. users can call multimodel_statistics
for the old implementation (I recommend a very annoying deprecation warning!), and multimodel_statistics2
for the new implementation.
Then you can have the benefit of having no breakage of the old recipes giving users some time to migrate, while also having the new implementation available in the release for the shakedown. Once the new implementation has matured, the old version can be removed.
Here is how I see it:
The recipes should surely be fixed, no matter what we decide about the multimodel preprocessor.
Would it be an idea to have both functions side by side? i.e. users can call
multimodel_statistics
for the old implementation (I recommend a very annoying deprecation warning!), andmultimodel_statistics2
for the new implementation.Then you can have the benefit of having no breakage of the old recipes giving users some time to migrate, while also having the new implementation available in the release for the shakedown. Once the new implementation has matured, the old version can be removed.
that's the right way from a software cycle point, absolutely, and in fact it'd be easy since we don't have to change anything in Tool - it'd also allow us to do a direct apples-to-apples comparison between the two modules, I like the idea :+1:
@zklaus To "fix the recipes" one probably needs to do additional fixes in the core in most cases:
There certainly may be changes necessary in the core and a bugfix release is a real possibility. We may even decide to revert the multi-model statistics if weighing the issues and solutions makes that the best choice.
But we should be clear where the problems lie, so as
* As far as I understood the bounds issue from @bettina-gier, this should be fixed in the core.
Yes. Longitudes and latitudes must have bounds. I am not sure whether this is a simple dataset issue that needs a fix or a bug in the regridding. That should be discussed in the appropriate issue.
* The only way to "fix" the leap year issue is at the moment, not to use the multi-model mean for daily data (what was done for the recipe where I encountered it). To really fix it, we would probably need a completely new version of regrid_time (preprocessor, so belongs to the core as well). If I understood it correctly, regrid_time only resamples the time keeping the number of points at the moment. If we would like to really regrid daily data onto the same grid, we would need possibilities to interpolate as for the horizontal regridding.
I think the issue is with different calendars, so at the moment one needs to avoid using multi-model statistics for daily data with mixed calendars. That seems quite sensible to me. A long-term solution involves probably conversions between calendars. This is commonly done by duplicating or removing days, but the details should be discussed elsewhere. I would probably rather place such a thing in a (very useful!) align_calendars
or similar pre-processor.
* For the issue with the 2m or 1.5m height a model fix was suggested (which would be part of the core). But I'm actually not sure, if that would be a better solution, because: * A model fix would change the height to 2m also if no multi-model mean is applied. * It is difficult, to calculate the right tas for 2m (If you use a climatology/fixed value the error will probably depend on latitude, land/sea, "weather", .... I'm not sure how well you could approximate it with the help of other variables: I guess the height grid for ta is too coarse to help, rather hfls and/or ts could be used for it, but I'm not sure how well this would work and if it is a good idea to use additional variables for fixes since it would make tas (2m) for such models to kind of a derived variable.) * The difference for tas (2m) and tas (1.5m) is probably tiny in most case.
You are completely right about these points.
* Not to use tas for a multi-model mean from models where it is not given for 2m is probably not a popular idea.
I'm not sure how much backlash there really would be. If it's only about a few CMIP5 datasets (that have a history of issues with their surface temperature), perhaps not many people care. Though this may be more widely spread.
* Therefore it would be maybe better to somehow allow to average 1.5m and 2m tas in case of multi-model means and give out a warning? (On the other hand, nobody reads warnings and there might be other similar but larger differences.)
Perhaps. I am really not sure either way. But should that also be done for the other offered statistics? What about standard deviation or extreme percentiles?
Either way, again, let's discuss the right issue. If we decide that averaging 1.5m and 2m tas is the way to go, let's state it that way and make clear what exactly is the behavior we want.
For the issue with the 2m or 1.5m height a model fix was suggested (which would be part of the core). But I'm actually not sure, if that would be a better solution, because:
- A model fix would change the height to 2m also if no multi-model mean is applied.
- It is difficult, to calculate the right tas for 2m (If you use a climatology/fixed value the error will probably depend on latitude, land/sea, "weather", .... I'm not sure how well you could approximate it with the help of other variables: I guess the height grid for ta is too coarse to help, rather hfls and/or ts could be used for it, but I'm not sure how well this would work and if it is a good idea to use additional variables for fixes since it would make tas (2m) for such models to kind of a derived variable.)
- The difference for tas (2m) and tas (1.5m) is probably tiny in most case.
- Not to use tas for a multi-model mean from models where it is not given for 2m is probably not a popular idea.
- Therefore it would be maybe better to somehow allow to average 1.5m and 2m tas in case of multi-model means and give out a warning? (On the other hand nobody reads warnings and there might be other similar but larger differences.)
I totally agree with Katja on the tas height issue. Actually, tas is defined as "near-surface (usually, 2 meter) air temperature". Some models give it at 1.5m (definitely ACCESS and I guess anything else using UM for the atmosphere, so UKESM, HadGEM etc.). So its quite a few models from the same family. Ignoring all of them in a multi-model mean because they define tas on a different height is not the way to go! This is not an error in the data. tas is a diagnostic anyway, and not a prognostic variable. If you need the prognostic variable and would care about the difference between 1.5 and 2 meters you need to use ta. So I would allow averaging 1.5m and 2m heights (simply ignore the height in this case would be my suggestion).
@zklaus I'm not blaming the changes in the core for the issues themselves. I'm just not happy, that there were not enough test before the release to discover them and that there was no warning about possible "breaking changes". It would be good, if everything in the tool would be working, when it is released and at the moment that is not the case with the new core. So I wonder how that could be solved best? I don't think it is good if we have to reduce/change the models used in the recipes for new release (because of bounds, 2m height, new model fixes which fix the grid for one variable but make a derived variable impossible as it happened for one model in case of recipe_deangelis15nat.yml ), but that would be the only way to now to only change the tool to make it work with the released core. (And I'm not even sure if it would help in all cases.) Part of theses issues can always happen (the leap year issue currently probably only effects a non merged recipe in a PR, one never can avoid something like that completely), but for this release there are rather many of these issues and we probably would have seen several of them with more tests before the core release.
@ruthlorenz, @katjaweigel, it's good to know that more models are affected (and I confirmed this for both CMIP5 and CMIP6). I opened a new issue to discuss specifically the question on how to treat this. It would be great if you could help to resolve the questions over there in #1204.
@katjaweigel, on tests vs releases, first, keep in mind that the current released version of the tool, 2.2.0, requires a core version <2.3.0, so there is no released version of the tool that has these problems. Whether a change is breaking or not can be difficult to determine until it breaks something, more so if the actual behavior is correct.
I agree that we should try to keep all models in the recipes, and even though that is sometimes not possible (for example, Manuel Schlund ran into a problem with a dataset that has meanwhile been retracted), just removing models because we can't deal with them anymore is certainly not the way to go.
I think all issues raised here are now being addressed in other issues. An overview can be found in https://github.com/ESMValGroup/ESMValTool/discussions/2218. Closing this. If you disagree, please feel free to reopen with a comment explaining the need.
Unfortunately I must consider reverting to the older multimodel module asap - the new module is EXTREMELY slow, buggy in different spots and not that much more efficient for memory usage. Take this recipe for instance:
I propose we retire the current implement to a development branch and temporarily re-instate the 2.2.0 module