biolab / orange3-timeseries

🍊 :chart_with_upwards_trend: Orange add-on for analyzing, visualizing, manipulating, and forecasting time series data.
Other
62 stars 41 forks source link

Periodogram: Show meaningful tick labels #233

Closed janezd closed 2 years ago

janezd commented 2 years ago
Issue

Fixes #214.

Description of changes

Decide an appropriate unit and scale.

Includes
ajdapretnar commented 2 years ago

Shouldn't this return days for Yahoo Finance data? I am still getting 1e+07, but I might have misunderstood the issue.

janezd commented 2 years ago

I'm sorry, a part of commit was missing. Please try now.

ajdapretnar commented 2 years ago

This works beautifully, expect in one particular case. Open Datasets, load Traffic accidents - events, connect to Periodogram and select GeoKoordinata Y (or X). Both contain missing values. The widget crashes with:

---------------------------- ValueError Exception -----------------------------
Traceback (most recent call last):
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodbase.py", line 135, in _selection_changed
    self.replot()
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 46, in replot
    max_time = max((np.max(periods) for periods, _ in datae), default=1)
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 46, in <genexpr>
    max_time = max((np.max(periods) for periods, _ in datae), default=1)
  File "<__array_function__ internals>", line 180, in amax
  File "/Users/ajda/.pyenv-x86/versions/py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2791, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/Users/ajda/.pyenv-x86/versions/py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
-------------------------------------------------------------------------------
codecov-commenter commented 2 years ago

Codecov Report

Merging #233 (fd31704) into master (949414d) will decrease coverage by 0.31%. The diff coverage is 36.84%.

@@            Coverage Diff             @@
##           master     #233      +/-   ##
==========================================
- Coverage   70.63%   70.32%   -0.32%     
==========================================
  Files          30       30              
  Lines        4383     4394      +11     
  Branches      664      669       +5     
==========================================
- Hits         3096     3090       -6     
- Misses       1167     1182      +15     
- Partials      120      122       +2     
Impacted Files Coverage Δ
orangecontrib/timeseries/widgets/owperiodogram.py 46.15% <25.00%> (-25.28%) :arrow_down:
orangecontrib/timeseries/widgets/owperiodbase.py 92.85% <33.33%> (-3.89%) :arrow_down:
orangecontrib/timeseries/functions.py 75.77% <60.00%> (-0.62%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

janezd commented 2 years ago

Bugs related to nans are caused by interpolation; changes from these PR may have only exposed them.

I fixed this twice. I changed the main commit, https://github.com/biolab/orange3-timeseries/pull/233/commits/ca2b1ef485dd6d20b7fa001d4c8e1ba06d123dfb to work with empty arrays (which it got above). I have also fixed interpolation: it assumed a sorted array, and also didn't use the scikit's existing extrapolation. Please let somebody check the second commit, https://github.com/biolab/orange3-timeseries/pull/233/commits/fd317049be513ebefe129589d6e16bc9f8849cc9.

This will not cover the case when time is missing. I would still consider this PR done (unless there's something else). If I remember correctly, @PrimozGodec said that he'd fix the problem with unknown time stamps, #165.

ajdapretnar commented 2 years ago

With the new version I get I crash on Traffic - events data. :( Not sure if that is a part that will be fixed elsewhere or not.

---------------------------- IndexError Exception -----------------------------
Traceback (most recent call last):
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodbase.py", line 135, in _selection_changed
    self.replot()
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 45, in replot
    pers_grams = [self.periodogram(attr) for attr in self.selection]
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 45, in <listcomp>
    pers_grams = [self.periodogram(attr) for attr in self.selection]
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 26, in periodogram
    periods * self.data.time_delta.deltas[0][0],
IndexError: invalid index to scalar variable.
-------------------------------------------------------------------------------
ajdapretnar commented 2 years ago

I am sorry, but I still can't let this pass. 😢 It fails on iris (it doesn't on master).

-------------------------- AttributeError Exception ---------------------------
Traceback (most recent call last):
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodbase.py", line 135, in _selection_changed
    self.replot()
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 45, in replot
    pers_grams = [self.periodogram(attr) for attr in self.selection]
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 45, in <listcomp>
    pers_grams = [self.periodogram(attr) for attr in self.selection]
  File "/Users/ajda/orange/orange3-timeseries/orangecontrib/timeseries/widgets/owperiodogram.py", line 26, in periodogram
    periods * self.data.time_delta.time_interval,
AttributeError: 'NoneType' object has no attribute 'time_interval'
-------------------------------------------------------------------------------
ajdapretnar commented 2 years ago

On that note, we should probably establish some general pipeline for Timeseries tests: i.e. works with Yahoo Finance, works with missing values (Traffic events), works with iris (generally, it should?), None, works with missing time (in the future)...

janezd commented 2 years ago

I fixed the problem with Iris. Kind of. Timeseries is a mess; it is a Table with some collection of random methods (other are implemented as proper functions) and some attributes which may exist or may be None. Like time_variable.

Timeseries should not be a class in the first place. time_variable should probably be stored in table's attributes, and all methods should be static functions.

You're right about testing procedure, but it's worse: most of these widgets do not have proper unit tests at all.

The basics of this add-on will have to be reimplemented, someday. But for this PR, this may be it.