TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis
https://stumpy.readthedocs.io/en/latest/
Other
3.68k stars 322 forks source link

Fixed #1014 Fix max_matches=None bug #1015

Closed ejorgensen-wl closed 4 months ago

ejorgensen-wl commented 4 months ago

This fix addresses issue #1014 by not slicing the query_matches if max_matches is set to None for motifs (which in turn calls _motifs with max_matches=np.inf.

I made an initial effort to add testing, but I'm sure there are better ways to do so.

Pull Request Checklist

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.42%. Comparing base (3077d0d) to head (9e0b2ce).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1015 +/- ## ======================================= Coverage 97.41% 97.42% ======================================= Files 89 89 Lines 14922 14934 +12 ======================================= + Hits 14537 14549 +12 Misses 385 385 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

seanlaw commented 4 months ago

@ejorgensen-wl Thanks again for identifying this bug. The decision to do:

https://github.com/TDAmeritrade/stumpy/blob/3077d0ddfb315464321dc86f8ec3bf2cab9ce3b1/stumpy/motifs.py#L336-L337

was decided a long time ago and, in hindsight, that was a poor choice. Instead, I think we could've/should've been more specific and done:

if max_matches is None:  # pragma: no cover
    max_matches = P.shape[-1]

P.shape[-1] is an integer that would be the same as the maximum number of subsequences in T (i.e., the standard T - m + 1 number of subsequences) that are possible and is a more explicit number than np.inf, which isn't even an integer (it's a floating point value!).

What do you think about this remedy instead? Would this work?

ejorgensen-wl commented 4 months ago

@ejorgensen-wl Thanks again for identifying this bug. The decision to do:

https://github.com/TDAmeritrade/stumpy/blob/3077d0ddfb315464321dc86f8ec3bf2cab9ce3b1/stumpy/motifs.py#L336-L337

was decided a long time ago and, in hindsight, that was a poor choice. Instead, I think we could've/should've been more specific and done:

if max_matches is None:  # pragma: no cover
    max_matches = P.shape[-1]

P.shape[-1] is an integer that would be the same as the maximum number of subsequences in T (i.e., the standard T - m + 1 number of subsequences) that are possible and is a more explicit number than np.inf, which isn't even an integer (it's a floating point value!).

What do you think about this remedy instead? Would this work?

That's an excellent and much cleaner solution than what I was thinking, thanks! I implemented your suggestions in my latest commit.

seanlaw commented 4 months ago

That's an excellent and much cleaner solution than what I was thinking, thanks! I implemented your suggestions in my latest commit.

@ejorgensen-wl Thank you for your patience and for working through this with us! We really appreciate the PR. Once the tests all pass, I'll merge the PR (unless you had anything further to add?).

ejorgensen-wl commented 4 months ago

I like it as is - thanks!

seanlaw commented 4 months ago

@ejorgensen-wl Thank you again for the excellent PR! I hope you will consider contributing more!