Closed BenMGeo closed 4 years ago
Transferred to core since it affects the preprocessor.
Transferred to core since it affects the preprocessor.
But do not forget to add any relevant outcome to the documentation (therefore, I raised the issue in the Main Tool repository).
The accompanying issue at iris (https://github.com/SciTools/iris/issues/3363) @bjlittle
nice one @BenMGeo :beer:
also +1 for using decorators :grin: Will you be in touch with the iris guys find out when the patch will go in, I suppose iris 2.3
I don't know if we should/can push (like in "make it urgent") the pull request over at iris. I'm not sure if I have the capacity to test the changes rcomer implemented. I also assume that the changes will be in a soonish release as tests are implemented and they realized that it is a severe issue.
It seems to be included in 2.3 as the pull request was merged for this milestone: https://github.com/SciTools/iris/milestone/40
The backend adds the auxiliary coordinates day_of_year, day_of_month, year, month_number.
Maybe we could move this into a separate preprocessor function, for those users who need it? I don't think this is part of CMOR or cf-conventions, so I'm not sure why we're doing this anyway. @jvegasbsc Can you comment on this?
We decided to add them because they are useful at several preprocessor functions and adding them at this step was thought to be better than dealing with them at each point they are used.
It is not a problem at all to remove that part of the code and change the functions to add them if needed
It is not a problem at all to remove that part of the code and change the functions to add them if needed
I think that might be better, it makes the code easier to understand if all code that does one thing is in one place. Also, we already have a preprocessor function that needs to remove these extra coordinates before it can do it's job: https://github.com/ESMValGroup/ESMValCore/blob/08eb1e4c0b03d811de54fb1f2209349302ebbdbb/esmvalcore/preprocessor/_time.py#L316-L320
We could add a new preprocessor function to add any of the extra coordinates provided by iris.coord_categorisation.add_*
if diagnostics need them, preferably quite late in the preprocessing chain to minimize the risk of accidentally deleting those coordinates again.
Just make sure that the diagnostic editors know about that change. Most of my code currently assumes that these coordinates are there, as I set them as a standard. I will happily change that, but others might run into issues if they are not aware.
@mattiarighi you want me to test if still an issue for iris 2.3?
Yes, that would be great. Thank you!
So... The good news: it is faster! The bad news: it is still a little slower than without the extra coordinates!
small_time = Mean_Cube(air_temp_small, "time")
# Finished 'Mean_Cube' in 0.0035 secs
print(small_time)
# ...
big_time = Mean_Cube(air_temp_big, "time")
# Finished 'Mean_Cube' in 0.0231 secs
print(big_time)
# ...
If the slow down of factor ~10 is acceptable for you, this is ok. I will still delete the coordinates as deleting costs less time.
Would you be willing to implement @jvegasbsc + @bouweandela suggestion?
I need to dive into this code part and make sure to have it backward compatible. If neither of the original editors has the possibility to do it, I'll give it a go.
I can do it. It is not specially complicated and I need something easy to get back on track after holidays
Thank you @jvegasbsc. Tell me if you need something for support.
One question, do we really need a preprocessor function to add those coordinates it? It is fairly easy (and maybe clearer) to add the coordinate in the diagnostic if needed.
The only case in which I think we may need it is if they are used on diagnostics that are not Iris based, but I don't know if that is the case
Quite a few that are not iris based, man, either xarray or cdo (see @earnone 's and a few others)
Quite a few that are not iris based, man, either xarray or cdo (see @earnone 's and a few others)
But are they using them? You don't need those coordinates to do a selmon with CDO...
That I dunno, too complicated a question given I'm still on holidays :grin: :beer:
One question, do we really need a preprocessor function to add those coordinates it? It is fairly easy (and maybe clearer) to add the coordinate in the diagnostic if needed.
Maybe you can grep
for those coordinate names in the existing diagnostics? That will give you a good impression of how commonly they are used.
I don't think any of us is using them. I agree running cdo does not need them.
@mattiarighi Do you know of any NCL diagnostics using this? I quickly checked and couldn't find anything, but for the year
coordinate is difficult to see at a glance.
year
seems to be used quite often in NCL scripts, however it is hard to tell whether it is internally defined or required from the preprocessor. I will test the affected recipes to make sure.
The other coordinates are never used.
In my diagnostics I just delete all aux coordinates if there are any. So I have no compatibility problems.
year seems to be used quite often in NCL scripts, however it is hard to tell whether it is internally defined or required from the preprocessor. I will test the affected recipes to make sure.
See checklist.
I recently realized that we experience quite an efficiency loss due to our added coordinates (by the backend) concerning time.
The backend adds the auxiliary coordinates day_of_year, day_of_month, year, month_number. If you leave them in for further calculations based on collapsing along the time axis in diagnostics, this leads to a significant increase of processing time!
Here is a "simple" example that shows the increasing run time:
The processing time increases by a factor of ~1000!!!
The content itself is the same.
This does not occur for collapsing along dimensions without auxiliary coordinates.
This needs to be told to the diagnostic developers if they use iris in their diagnostics (unknown if this is relevant for other packages besides iris). I'm not sure if this is relevant for the backend, though.
I will address this issue also to the iris community, as I think this is a solvable performance issue. (as discussed with @bjoernbroetz)