aimclub / FEDOT

Automated modeling and machine learning framework FEDOT
https://fedot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
623 stars 84 forks source link

Fix low speed of lagged implementation #1144

Closed kasyanovse closed 11 months ago

kasyanovse commented 11 months ago

Replace chain concatenation with pandas by numpy bulk realization.

aim-pep8-bot commented 11 months ago

Hello @kasyanovse! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2023-08-15 07:20:28 UTC
codecov[bot] commented 11 months ago

Codecov Report

Merging #1144 (6ae8f6c) into master (89ff552) will decrease coverage by 0.21%. Report is 2 commits behind head on master. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1144      +/-   ##
==========================================
- Coverage   78.67%   78.47%   -0.21%     
==========================================
  Files         131      130       -1     
  Lines        9362     9323      -39     
==========================================
- Hits         7366     7316      -50     
- Misses       1996     2007      +11     
Files Changed Coverage Δ
...lementations/data_operations/ts_transformations.py 77.65% <100.00%> (-0.77%) :arrow_down:

... and 16 files with indirect coverage changes

kasyanovse commented 11 months ago

Speed test. Also some models are speeded up too because they use ts_to_table function that is implement in module with lagged.

Code (for IPython) ``` import numpy as np from examples.simple.time_series_forecasting.ts_pipelines import ts_polyfit_pipeline, ts_complex_ridge_pipeline from fedot.core.data.data import InputData from fedot.core.repository.dataset_types import DataTypesEnum from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams series = np.random.rand(100000) data = InputData(idx=np.arange(series.shape[0]), features=series, target=series, task=Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=100)), data_type=DataTypesEnum.ts) pipeline = ts_polyfit_pipeline(2) %timeit pipeline.fit(data) pipeline = ts_complex_ridge_pipeline() %timeit pipeline.fit(data) ```

Results.

  1. Old lagged
    1. 2.28 s ± 232 ms for polyfit
    2. 2.39 s ± 62.3 ms for lagged + ridge
  2. New lagged
    1. 48.3 ms ± 1.15 ms for polyfit
    2. 605 ms ± 11.3 ms for lagged + ridge

Pipeline with ridge spends half of all time for ridge fitting, therefore speed up is not so high.