Closed valeriupredoi closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 94.77%. Comparing base (
ffa2347
) to head (3b6ce2b
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
ayay - check out the devel test with bleeding edge deps:
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 0:00:00.997120
x: array([cftime.DatetimeProlepticGregorian(2004, 11, 4, 13, 30, 0, 997120, has_year_zero=True)],
dtype=object)
y: array([datetime.datetime(2004, 11, 4, 13, 30)], dtype=object)
cftime plugging in some 0.997s out of thin air, they'd get absolutely destroyed if they worked in motorsports :grin:
Have you tried this suggestion to work around the issue? https://github.com/pandas-dev/pandas/issues/57002#issuecomment-1906866998
Have you tried this suggestion to work around the issue? pandas-dev/pandas#57002 (comment)
Yes, it don't work, no Pd rounding works, including the one in the cmorizer ๐บ
This also doesn't sound like a workaround to me...
However, what DOES work is the hint in the description of that issue:
# Round just date works
dates[0].round("30min")
# >Timestamp('2022-01-11 12:00:00')
See here:
>>> datetimes = year_month_day + day_float
>>> print(datetimes)
0 2004-11-04 13:59:59.997120
dtype: datetime64[ns]
>>> rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes)
>>> print(rounded_datetimes)
0 2004-11-04 14:00:00
dtype: datetime64[ns]
Ahh wait, I just realized that I have an old version of pandas, sorry, will update you again once that is updated...
You need to update ninja bear to 2.2.3 Manu ๐
Yes!! It does work with pandas v2.2.3:
>>> import pandas as pd
>>> pd.__version__
'2.2.3'
>>> time_points = [20041104.5833333]
>>> time_str = pd.Series(time_points, dtype=str)
>>> year_month_day_str = time_str.str.extract(r'(\d*)\.?\d*', expand=False)
>>> year_month_day = pd.to_datetime(year_month_day_str, format='%Y%m%d')
>>> day_float_str = time_str.str.extract(
r'\d*(\.\d*)',... r'\d*(\.\d*)', expand=False
... ).fillna('0.0')
>>> day_float = pd.to_timedelta(day_float_str.astype(float), unit='D')
>>> datetimes = year_month_day + day_float
>>> print(datetimes)
0 2004-11-04 13:59:59.997120
dtype: datetime64[ns]
>>> # Old version:
>>> rounded_datetimes = datetimes.round('s')
>>> print(rounded_datetimes)
0 2004-11-04 13:59:59.997120
dtype: datetime64[ns]
>>> # New version:
>>> rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes)
>>> print(rounded_datetimes)
0 2004-11-04 14:00:00
dtype: datetime64[ns]
You want to plug this in @valeriupredoi? You can simply replace the existing l.496 with
rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes)
and should work ๐
I can also do it if you want, already got a local copy with this change.
So why is the rounding done in the ICON cmorizer don't work then, and all the attempts I tried two days ago? You mind plopping the correct syntax in this PR, and get rid of the peasant rounding then plss, bud ๐บ
I can also do it if you want, already got a local copy with this change.
That'd be fab, cheers, bud! :beer:
In the pandas issue they say that only rounding the pd.Series
objects fail, i.e.,
rounded_datetimes = datetimes.round('s')
This is what we currently have; hence, it fails.
However, rounding individual elements, i.e.,
rounded_datetimes = pd.Series(dt.round('s') for dt in datetimes)
works. Please don't ask me why...
that'd be why then - I completely missed the conversion to pd.Series
and was trying to round each element of the array, doing exactly what was broken in Pandas :man_facepalming:
Did you time some representative test case @schlunma? Looping over every element of an array in Python and then doing something with it can be really slow.
Did you time some representative test case @schlunma? Looping over every element of an array in Python and then doing something with it can be really slow.
No, but I don't think it's critical here. All ICON output I've seen so far is scattered across many files with only very few time points (sometimes only one time step is stored). Even the high-res output (20 mins) we have only has 72 time points per file.
However, I am open to any better solution. All solutions I could think of involved Python loops in one way or another.
we can always revert to the original implementation when Pandas finally fix their silly bug. Thanks a lot, Manu! @bouweandela would you be a trooper and merge this, pls :beer:
However, I am open to any better solution. All solutions I could think of involved Python loops in one way or another.
Well, my peasant approach didn't - it involved a list not an array :rofl:
Well, my peasant approach didn't - it involved a list not an array ๐คฃ
But you still had a for
loop in there ๐
Well, my peasant approach didn't - it involved a list not an array ๐คฃ
But you still had a
for
loop in there ๐
An even worse one too ๐คฃ
I did some timings (with pandas 2.1.4) and for 100 points the time it takes is quite marginal, even if the new approach is more than 10 times slower:
In [0]: import pandas as pd
...:
...: dates = pd.Series([pd.to_datetime("2022-01-11 12:10:01.123")] * 100)
In [1]: %timeit pd.Series(dt.round('s') for dt in dates)
906 ฮผs ยฑ 16 ฮผs per loop (mean ยฑ std. dev. of 7 runs, 1,000 loops each)
In [2]: %timeit dates.round("s")
67.3 ฮผs ยฑ 1.15 ฮผs per loop (mean ยฑ std. dev. of 7 runs, 10,000 loops each)
Many thanks, gents ๐บ
Description
We have been pinning Pandas since at least three minor versions ago, see https://github.com/ESMValGroup/ESMValCore/pull/2394 due to a bug in Pandas that they seem to be taking forever to fix issue https://github.com/pandas-dev/pandas/issues/57002 and now we are yet again faced with the same issue with the new version of Pandas 2.2.3 - see failed nightly GA. The problem is simple: Pandas are simply unable to round to the next minute eg 12H:12M:39S should be 12H:13M:00S (for daily data), so I got fed up and wrote our own rounding function (that took me surprisingly long to get it right - about an hour, rounded to the next minute :rofl: I think this should not affect the result when Pandas finally fix their bug - but I'll let @schlunma decide :beer:
Closes #2530
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the ๐ Technical or ๐งช Scientific review.
To help with the number pull requests: