functime-org / functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
https://docs.functime.ai
Apache License 2.0
1.01k stars 54 forks source link

Feature Extraction Tsfresh Rewrite Quality Assurance #52

Closed abstractqqq closed 10 months ago

abstractqqq commented 11 months ago

First, thank you everybody for contributing to the rewrite.

We are planning to make this project more public, which means we need to make sure that the quality is good. For this round of review, we want to focus on the following 3 items (ranked in terms of importance):

Feature Name | Implemented Lazy (Expr) | Implemented Eager (Series) | Need More Review -- | -- | -- | -- absolute_energy | Y | Y |   absolute_maximum | Y | Y |   absolute_sum_of_changes | Y | Y |   approximate_entropy | N | Y | augmented_dickey_fuller | N | Y | Y autocorrelation | N | Y | autoregressive_coefficients | N | Y | Y benford_correlation | Y | Y |   binned_entropy | Y | Y |   c3 | Y | Y |   change_quantiles | Y | Y |   cid_ce | Y | Y | count_above | Y | Y |   count_above_mean | Y | Y |   count_below | Y | Y |   count_below_mean | Y | Y |   cwt_coefficients | N | Y | Y energy_ratios | Y | Y |   first_location_of_maximum | Y | Y |   first_location_of_minimum | Y | Y |   fourier_entropy | N | Y | Y friedrich_coefficients | N | Y | has_duplicate | Y | Y |   has_duplicate_max | Y | Y |   has_duplicate_min | Y | Y |   index_mass_quantile | Y | Y | large_standard_deviation | Y | Y |   last_location_of_maximum | Y | Y |   last_location_of_minimum | Y | Y |   lempel_ziv_complexity | N | Y | linear_trend | Y | Y | longest_strike_above_mean | Y | Y | longest_strike_below_mean | Y | Y | mean_abs_change | Y | Y |   mean_change | Y | Y |   mean_n_absolute_max | Y | Y |   mean_second_derivative_central | Y | Y | number_crossings | Y | Y | number_cwt_peaks | N | Y | number_peaks | Y | Y | percent_reocurring_points | Y | Y | percent_reoccuring_values | Y | Y | permutation_entropy | Y | Y | range_count | Y | Y |   ratio_beyond_r_sigma | Y | Y |   ratio_n_unique_to_length | Y | Y |   root_mean_square | Y | Y |   sample_entropy | N | Y | spkt_welch_density | N | Y | sum_reocurring_points | Y | Y |   sum_reocurring_values | Y | Y |   symmetry_looking | Y | Y |   time_reversal_asymmetry_statistic | Y | Y |   variation_coefficient | Y | Y |   harmonic_mean | Y | Y |   fft_coefficients | N | Y | Y
abstractqqq commented 11 months ago

@topher-lo How close are you to finishing FFT and feature bundles? Do you need some help?

topher-lo commented 11 months ago

FFT is done. Feature bundles haven't started. If you have ideas for the latter, please take it on 🙏. I can help with docs

abstractqqq commented 10 months ago

Closing this since we are mostly done.

The untested features that no one knows how to check the validity should be decorated with UseAtOwnRisk decorator I introduced in this branch: https://github.com/neocortexdb/functime/tree/feat/decorator_loggin_infra .