Open thwllms opened 5 months ago
Relevant issue with the following file from the Kanawha bucket via @zherbz:
.../sims/ressim/383/ras/LowerKanawha/LowerKanawha.p01.hdf
When rashdf
is installed with pandas==2.1.4
and numpy==1.26.0
, there's an error extracting mesh cell points.
Python 3.10.14 (main, Apr 18 2024, 16:25:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from src.rashdf import RasPlanHdf
>>> plan_hdf = RasPlanHdf("LowerKanawha.p01.hdf")
>>> plan_hdf.mesh_cell_points()
Traceback (most recent call last):
File "offsets.pyx", line 4548, in pandas._libs.tslibs.offsets.to_offset
ValueError: invalid literal for int() with base 10: '0.1'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/thomaswilliams/dev/rashdf/src/rashdf/plan.py", line 719, in mesh_cell_points
return self._mesh_summary_outputs_gdf(
File "/home/thomaswilliams/dev/rashdf/src/rashdf/plan.py", line 686, in _mesh_summary_outputs_gdf
df = self.mesh_summary_output(var, round_to=round_to)
File "/home/thomaswilliams/dev/rashdf/src/rashdf/plan.py", line 602, in mesh_summary_output
df = methods_with_times[var](round_to=round_to)
File "/home/thomaswilliams/dev/rashdf/src/rashdf/plan.py", line 567, in mesh_max_ws_err
df = self._mesh_summary_output_min_max(
File "/home/thomaswilliams/dev/rashdf/src/rashdf/plan.py", line 359, in _mesh_summary_output_min_max
times = self._mesh_summary_output_min_max_times(
File "/home/thomaswilliams/dev/rashdf/src/rashdf/plan.py", line 323, in _mesh_summary_output_min_max_times
max_ws_times = ras_timesteps_to_datetimes(
File "/home/thomaswilliams/dev/rashdf/src/rashdf/utils.py", line 307, in ras_timesteps_to_datetimes
return [
File "/home/thomaswilliams/dev/rashdf/src/rashdf/utils.py", line 308, in <listcomp>
start_time + pd.Timedelta(timestep, unit=time_unit).round(round_to)
File "timedeltas.pyx", line 1949, in pandas._libs.tslibs.timedeltas.Timedelta.round
File "timedeltas.pyx", line 1912, in pandas._libs.tslibs.timedeltas.Timedelta._round
File "offsets.pyx", line 4460, in pandas._libs.tslibs.offsets.to_offset
File "offsets.pyx", line 4557, in pandas._libs.tslibs.offsets.to_offset
ValueError: Invalid frequency: 0.1 s
Sure enough, Pandas 2.1.x doesn't seem to like non-integer offsets:
>>> pd.tseries.frequencies.to_offset("1s")
<Second>
>>> pd.tseries.frequencies.to_offset("10s")
<10 * Seconds>
>>> pd.tseries.frequencies.to_offset("1.0s")
Traceback (most recent call last):
File "offsets.pyx", line 4548, in pandas._libs.tslibs.offsets.to_offset
ValueError: invalid literal for int() with base 10: '1.0'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "offsets.pyx", line 4460, in pandas._libs.tslibs.offsets.to_offset
File "offsets.pyx", line 4557, in pandas._libs.tslibs.offsets.to_offset
ValueError: Invalid frequency: 1.0s
>>> pd.tseries.frequencies.to_offset("0.001s")
Traceback (most recent call last):
File "offsets.pyx", line 4548, in pandas._libs.tslibs.offsets.to_offset
ValueError: invalid literal for int() with base 10: '0.001'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "offsets.pyx", line 4460, in pandas._libs.tslibs.offsets.to_offset
File "offsets.pyx", line 4557, in pandas._libs.tslibs.offsets.to_offset
ValueError: Invalid frequency: 0.001s
But things work fine with pandas==2.2.2
and numpy==2.0.0
:
>>> pd.tseries.frequencies.to_offset("1.0s")
<Second>
>>> pd.tseries.frequencies.to_offset("0.1s")
<100 * Millis>
As of early June 2024,
rashdf
relies on three major dependencies:h5py
geopandas
pyarrow
We should figure out what the minimum version numbers of these dependencies should be and set them in
pyproject.toml
. GeoPandas is probably the most sensitive one.Pinning dependency versions for tests and docs would be good, too.