Feature Extraction Tsfresh Rewrite Quality Assurance

abstractqqq commented 11 months ago

First, thank you everybody for contributing to the rewrite.

We are planning to make this project more public, which means we need to make sure that the quality is good. For this round of review, we want to focus on the following 3 items (ranked in terms of importance):

[x] [99%] Correctness. Correctness with respect to implementation, to feature definition, and the final numerical result should make sense. There are a few I haven't reviewed or I don't know if anybody has reviewed: cwt_coefficients, autoregressive_coefficients, augmented_dickey_fuller, and the newly added fft_coefficients.. Please comment if you think there are others we need to review.
[x] [99%] Tests. Building out more tests for all the features. Thank you @MathieuCayssol for building out more tests.
[x] Delayed FFT features. Now that fft_coefficients is implemented, we can start using it?
Ongoing, Performance. For some methods, there might be short cuts in eager mode, or vice versa. For methods (eager) that use NumPy under the hood, can we do better? Is there redundant computation? More efficient methods?

Feature Name | Implemented Lazy (Expr) | Implemented Eager (Series) | Need More Review -- | -- | -- | -- absolute_energy | Y | Y | absolute_maximum | Y | Y | absolute_sum_of_changes | Y | Y | approximate_entropy | N | Y | augmented_dickey_fuller | N | Y | Y autocorrelation | N | Y | autoregressive_coefficients | N | Y | Y benford_correlation | Y | Y | binned_entropy | Y | Y | c3 | Y | Y | change_quantiles | Y | Y | cid_ce | Y | Y | count_above | Y | Y | count_above_mean | Y | Y | count_below | Y | Y | count_below_mean | Y | Y | cwt_coefficients | N | Y | Y energy_ratios | Y | Y | first_location_of_maximum | Y | Y | first_location_of_minimum | Y | Y | fourier_entropy | N | Y | Y friedrich_coefficients | N | Y | has_duplicate | Y | Y | has_duplicate_max | Y | Y | has_duplicate_min | Y | Y | index_mass_quantile | Y | Y | large_standard_deviation | Y | Y | last_location_of_maximum | Y | Y | last_location_of_minimum | Y | Y | lempel_ziv_complexity | N | Y | linear_trend | Y | Y | longest_strike_above_mean | Y | Y | longest_strike_below_mean | Y | Y | mean_abs_change | Y | Y | mean_change | Y | Y | mean_n_absolute_max | Y | Y | mean_second_derivative_central | Y | Y | number_crossings | Y | Y | number_cwt_peaks | N | Y | number_peaks | Y | Y | percent_reocurring_points | Y | Y | percent_reoccuring_values | Y | Y | permutation_entropy | Y | Y | range_count | Y | Y | ratio_beyond_r_sigma | Y | Y | ratio_n_unique_to_length | Y | Y | root_mean_square | Y | Y | sample_entropy | N | Y | spkt_welch_density | N | Y | sum_reocurring_points | Y | Y | sum_reocurring_values | Y | Y | symmetry_looking | Y | Y | time_reversal_asymmetry_statistic | Y | Y | variation_coefficient | Y | Y | harmonic_mean | Y | Y | fft_coefficients | N | Y | Y

abstractqqq commented 11 months ago

@topher-lo How close are you to finishing FFT and feature bundles? Do you need some help?

topher-lo commented 11 months ago

FFT is done. Feature bundles haven't started. If you have ideas for the latter, please take it on 🙏. I can help with docs

abstractqqq commented 10 months ago

Closing this since we are mostly done.

The untested features that no one knows how to check the validity should be decorated with UseAtOwnRisk decorator I introduced in this branch: https://github.com/neocortexdb/functime/tree/feat/decorator_loggin_infra .

functime-org / functime

Feature Extraction Tsfresh Rewrite Quality Assurance #52