NannyML / nannyml

nannyml: post-deployment data science in python
https://www.nannyml.com/
Apache License 2.0
1.97k stars 139 forks source link

Optimize filtering univariate result for period #391

Closed michael-nml closed 5 months ago

michael-nml commented 5 months ago

The PerMetricPerColumnResult filter function has a high overhead for selecting a subset of columns or metrics. This overhead is also incurred (and highest) when only filtering for period, as then all columns & metrics will be selected.

This commit adds a short-circuit path to avoid the overhead when only the period requires filtering. For a result with 50 columns and 8 metrics this results in a >100x speed-up when only filtering for period.

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 76.52%. Comparing base (da33807) to head (246ca35).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #391 +/- ## ======================================= Coverage 76.52% 76.52% ======================================= Files 110 110 Lines 9242 9244 +2 Branches 1658 1659 +1 ======================================= + Hits 7072 7074 +2 Misses 1703 1703 Partials 467 467 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.