Closed mhidas closed 2 years ago
While fixing this, should also apply the work-around for the pandas.Timedelta
units issue, as done in the velocity hourly code (https://github.com/aodn/python-aodntools/pull/99#discussion_r391440014)
The full stack traces are
TypeError: Operation sub between float64 and Timedelta is invalid
Traceback (most recent call last):
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodncore/pipeline/handlerbase.py", line 1052, in run
self.trigger(transition['trigger'])
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 65, in _get_trigger
return machine.events[trigger_name].trigger(model, *args, **kwargs)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 405, in trigger
return self.machine._process(func)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1073, in _process
return trigger()
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 423, in _trigger
return self._process(event_data)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 433, in _process
if trans.execute(event_data):
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 279, in execute
machine.callback(func, event_data)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1031, in callback
func(*event_data.args, **event_data.kwargs)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 390, in preprocess
self._make_hourly_timeseries()
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 300, in _make_hourly_timeseries
**self.product_common_kwargs)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 507, in hourly_aggregator
df_temp = PDresample_by_hour(df_temp, function_dict, function_stats) # do the magic
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 399, in PDresample_by_hour
df.index = df.index - pd.Timedelta(30, units='m')
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 121, in index_arithmetic_method
return self._evaluate_with_timedelta_like(other, op)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 4980, in _evaluate_with_timedelta_like
other=type(other).__name__))
and
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Traceback (most recent call last):
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodncore/pipeline/handlerbase.py", line 1052, in run
self.trigger(transition['trigger'])
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 65, in _get_trigger
return machine.events[trigger_name].trigger(model, *args, **kwargs)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 405, in trigger
return self.machine._process(func)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1073, in _process
return trigger()
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 423, in _trigger
return self._process(event_data)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 433, in _process
if trans.execute(event_data):
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 279, in execute
machine.callback(func, event_data)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1031, in callback
func(*event_data.args, **event_data.kwargs)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 390, in preprocess
self._make_hourly_timeseries()
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 300, in _make_hourly_timeseries
**self.product_common_kwargs)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 507, in hourly_aggregator
df_temp = PDresample_by_hour(df_temp, function_dict, function_stats) # do the magic
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 405, in PDresample_by_hour
ds_var_mean = ds_var.resample('1H').apply(function_dict[variable]).astype(np.float32)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/generic.py", line 8155, in resample
base=base, key=on, level=level)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/resample.py", line 1250, in resample
return tg._get_resampler(obj, kind=kind)
File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/resample.py", line 1380, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
Thought this might be a good place to mention that I ran into an error when creating hourly timeseries products locally. Using the latest code on Github, I had to change output_dir=args.output_path
on line 579 in 'hourly_timeseries.py' to output_dir=args.output_dir
for the code to work. Easy fix but worth mentioning.
I also get warnings for function stringtochar() on lines 298 and 299 of 'aggregated_timeseries.py'. The warning suggests using function tobytes() instead.
Thanks @mphemming - your feedback is welcome. However these are unrelated to this thread, so I've moved them to separate issues: #135 #136
The original errors reported above occur under Python 3.5 In Python 3.8 when running the code on the same data we get different errors from different parts of the code. E.g. for site SAM7DS
test_aodntools/timeseries_products/test_hourly_timeseries.py:125 (TestHourlyTimeseriesDebugging.test_typeerror)
self = <xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7f1ab30a1700>
key = (array([], dtype=int64), slice(None, None, None), slice(None, None, None))
def _getitem(self, key):
if self.datastore.is_remote: # pragma: no cover
getitem = functools.partial(robust_getitem, catch=RuntimeError)
else:
getitem = operator.getitem
try:
with self.datastore.lock:
original_array = self.get_array(needs_lock=False)
> array = getitem(original_array, key)
../../python-aodntools-py38/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:106:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
src/netCDF4/_netCDF4.pyx:4383:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
count = array([], shape=(0, 1, 1, 3), dtype=int64)
def _out_array_shape(count):
"""Return the output array shape given the count array created by getStartCountStride"""
s = list(count.shape[:-1])
out = []
for i, n in enumerate(s):
if n == 1:
> c = count[..., i].ravel()[0] # All elements should be identical.
E IndexError: index 0 is out of bounds for axis 0 with size 0
../../python-aodntools-py38/lib/python3.8/site-packages/netCDF4/utils.py:458: IndexError
But also...
During handling of the above exception, another exception occurred:
self = <test_aodntools.timeseries_products.test_hourly_timeseries.TestHourlyTimeseriesDebugging testMethod=test_typeerror>
def test_typeerror(self):
> output_file, bad_files = hourly_aggregator(files_to_aggregate=SAM7_LIST,
site_code='SAM7DS',
qcflags=(1, 2),
input_dir=TEST_ROOT,
output_dir='/tmp'
)
test_hourly_timeseries.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../aodntools/timeseries_products/hourly_timeseries.py:413: in hourly_aggregator
nc_clean = in_water(nc) # in water only
../../aodntools/timeseries_products/hourly_timeseries.py:79: in in_water
return nc.where((TIME >= time_deployment_start) & (TIME <= time_deployment_end), drop=True)
...
IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().
../../python-aodntools-py38/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:116: IndexError
The first error ("Operation sub between float64 and Timedelta is invalid" in Py3.5) only occurs for files where all the data are flagged as bad, which results in trying to process an empty array.
The error in Py3.8 happens for a similar reason - all the data are out-of-water, i.e. ouside the range set by time_deployment_start
and time_deployment_end
.
A couple of similar errors while trying to create the hourly products in the pipeline for some sites.
[x] TypeError: Operation sub between float64 and Timedelta is invalid
occurs for
ITFTIS (306 input files, task id e103219b-f5b1-4f4a-a7eb-0e4c4217a491)
PH100
SAM7DS
[x] TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
for
SYD100
SYD140
[x] While fixing this, should also apply the work-around for the
pandas.Timedelta
units issue, as done in the velocity hourly code (https://github.com/aodn/python-aodntools/pull/99#discussion_r391440014)