aimclub / FEDOT

Automated modeling and machine learning framework FEDOT
https://fedot.readthedocs.io
BSD 3-Clause "New" or "Revised" License
619 stars 84 forks source link

Fast topological features #1252

Closed kasyanovse closed 5 months ago

kasyanovse commented 5 months ago

This is a 🙋 feature or enhancement.

Summary

Ускоренная версия топологических фич (в 30 раз). От обычных топологических фич отличаются достаточно сильно:

  1. Скинул весь код в один класс.
  2. Отказался от использования giotto-tda в пользу giotto-ph.
  3. Изменил расчет фич из топологических фич для максимального ускорения.

Context

Inspired by https://github.com/aimclub/FEDOT/pull/1241.

pep8speaks commented 5 months ago

Hello @kasyanovse! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2024-01-27 18:04:17 UTC
github-actions[bot] commented 5 months ago

Code in this pull request still contains PEP8 errors, please write the /fix-pep8 command in the comments below to create commit with automatic fixes.

Comment last updated at
codecov[bot] commented 5 months ago

Codecov Report

Attention: 30 lines in your changes are missing coverage. Please review.

Comparison is base (5e726e9) 80.05% compared to head (8f895c3) 79.84%.

Files Patch % Lines
...erations/topological/fast_topological_extractor.py 30.23% 30 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1252 +/- ## ========================================== - Coverage 80.05% 79.84% -0.21% ========================================== Files 149 150 +1 Lines 10278 10322 +44 ========================================== + Hits 8228 8242 +14 - Misses 2050 2080 +30 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kasyanovse commented 5 months ago

/fix-pep8

valer1435 commented 5 months ago

Хотелось бы тест на то, что фичи получаются +- те же, что и в обычном

kasyanovse commented 5 months ago

Хотелось бы тест на то, что фичи получаются +- те же, что и в обычном

Здесь из топологии генерируются другие фичи, поэтому смысла в этом нет. Сравнение предсказаний для lagged-topo-ridge на картинке. Я бы не сказал, что есть принципиальные отличия, однако можно сказать что fast_topo не уловило низкочастотные составляющие. Это жертва ради скорости, но если нужно, то качество можно улучшить, докинув к квантилям еще и стат фичи.

image

Код для генерации картинки ```py import logging from time import perf_counter import pickle import numpy as np from matplotlib import pyplot as plt from fedot.core.pipelines.node import PipelineNode from fedot.core.pipelines.pipeline import Pipeline from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams from fedot.api.main import Fedot from fedot.core.data.data import InputData from fedot.core.repository.dataset_types import DataTypesEnum from fedot.core.data.data_split import train_test_data_setup RANDOM_SEED = 100 def get_data(data_length=500, test_length=100): garmonics = [(0.1, 0.9), (0.1, 1), (0.1, 1.1), (0.05, 2), (0.05, 5), (1, 0.02)] time = np.linspace(0, 100, data_length) data = time * 0 for g in garmonics: data += g[0] * np.sin(g[1] * 2 * np.pi / time[-1] * 25 * time) data = InputData(idx=np.arange(0, data.shape[0]), features=data, target=data, task=Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=test_length)), data_type=DataTypesEnum.ts) return train_test_data_setup(data, validation_blocks=1, split_ratio=(data_length - test_length) / ((data_length - test_length) + test_length)) def plot_ppl(ppls, train, test, labels): _, ax = plt.subplots() limits = len(test.target) ax.plot(train.idx[-limits:], train.target[-limits:], label='train') ax.plot(test.idx, test.target, label='test') for label, ppl in zip(labels, ppls): predict = ppl.predict(test).predict ax.plot(test.idx[-len(predict):], predict, label=label) ax.legend() if __name__ == '__main__': train, test = get_data() node = PipelineNode('lagged') node = PipelineNode('fast_topological_features', nodes_from=[node]) node = PipelineNode('ridge', nodes_from=[node]) ppl1 = Pipeline(node) t0 = perf_counter() ppl1.fit(train) ppl1.predict(test) print(perf_counter() - t0) train, test = get_data() node = PipelineNode('lagged') node = PipelineNode('topological_features', nodes_from=[node]) node = PipelineNode('ridge', nodes_from=[node]) ppl2 = Pipeline(node) t0 = perf_counter() ppl2.fit(train) ppl2.predict(test) print(perf_counter() - t0) plot_ppl([ppl1, ppl2], train, test, ('fast_topo', 'topo')) ```