ermshaua / claspy

ClaSPy: A Python package for time series segmentation.
BSD 3-Clause "New" or "Revised" License
78 stars 3 forks source link

Issues while calling model.fit_predict #11

Open lucamcwood opened 2 months ago

lucamcwood commented 2 months ago

Hello, I am performing a grid search optimization and Claspy is a part of this search. Sometimes while trying some combinations of hyper-parameters I get two types of error:

  1. Python int too large to convert to C long
  2. Negative dimensions are not allowed

I am trying to collect the parameter combinations that lead to those errors. I will keep you informed.

_ ERROR:root:Exception occurred: Python int too large to convert to C long Traceback (most recent call last): File ".../cp_algorithms.py", line 468, in predict cps = self.clasp_model.fit_predict(batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/claspy/segmentation.py", line 315, in fit_predict return self.fit(time_series).predict(sparse) ^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/claspy/segmentation.py", line 201, in fit self.window_size = max(1, map_window_size_methods(self.window_size)(time_series) // 2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/claspy/window_size.py", line 107, in suss score = 1 - (_suss_score(time_series, window_size, stats) - min_score) / (max_score - min_score) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/claspy/window_size.py", line 35, in _suss_score roll_mean = roll.mean().to_numpy()[window_size:] ^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/pandas/core/window/rolling.py", line 2223, in mean return super().mean( ^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/pandas/core/window/rolling.py", line 1551, in mean return self._apply( ^^^^^^^^^^^^ File ".../Dude_py311/lib64/python3.11/site-packages/pandas/core/window/rolling.py", line 663, in _apply return self._apply_blockwise(homogeneous_func, name, numeric_only) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/pandas/core/window/rolling.py", line 503, in _apply_blockwise return self._apply_series(homogeneous_func, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/pandas/core/window/rolling.py", line 487, in _apply_series result = homogeneous_func(values) ^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/pandas/core/window/rolling.py", line 658, in homogeneous_func result = calc(values) ^^^^^^^^^^^^ File ".../Dude_py311/lib64/python3.11/site-packages/pandas/core/window/rolling.py", line 655, in calc return func(x, start, end, min_periods, *numba_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pandas/_libs/window/aggregations.pyx", line 255, in pandas._libs.window.aggregations.rollmean OverflowError: Python int too large to convert to C long ERROR:root:Exception occurred: Python int too large to convert to C long

_ ERROR:root:Exception occurred: negative dimensions are not allowed Traceback (most recent call last): File ".../cp_algorithms.py", line 468, in predict cps = self.clasp_model.fit_predict(batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/claspy/segmentation.py", line 315, in fit_predict return self.fit(time_series).predict(sparse) ^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/claspy/segmentation.py", line 242, in fit profile = np.full(shape=self.n_timepoints - self.window_size + 1, fillvalue=-np.inf, dtype=np.float64) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../lib64/python3.11/site-packages/numpy/core/numeric.py", line 344, in full a = empty(shape, dtype, order) ^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: negative dimensions are not allowed

ermshaua commented 2 months ago

Looks like the window size makes problems here. Make sure that the window size is always a lot smaller than the time series size, otherwise the algorithm runs into problems.

As an upper bound for the window size, you could choose maybe 0.1 * time series size. Also, the window size should not be smaller than at least 3 or 5 values. So use such a constant as your lower bound.

Let me know if this helps!