etna-team / etna

ETNA ā€“ Time-Series Library
https://docs.etna.ai
Apache License 2.0
142 stars 7 forks source link

[BUG] Discrepancies between `Pipeline` and `AutoRegressivePipeline` in handling a given `ts` after transforms #440

Open d-a-bunin opened 3 months ago

d-a-bunin commented 3 months ago

šŸ› Bug Report

There is a difference in a way how Pipeline and AutoRegressivePipeline are handling a given ts in forecast in relation to transforms.

Expected behavior

I think that the behavior should be the same. It isn't really obvious which is better, but probably Pipeline's.

How To Reproduce

from copy import deepcopy

from etna.datasets import TSDataset
from etna.datasets import generate_ar_df
from etna.pipeline import AutoRegressivePipeline, Pipeline
from etna.transforms import LagTransform, AddConstTransform
from etna.models import LinearMultiSegmentModel

def main():
    df = generate_ar_df(n_segments=3, start_time="2020-01-01", periods=100, freq="D")
    ts = TSDataset(df=df, freq="D")

    model = LinearMultiSegmentModel()
    transforms = [
        LagTransform(in_column="target", lags=[7, 8, 9, 10])
    ]
    autoreg_pipeline = AutoRegressivePipeline(model=model, transforms=transforms, horizon=7)
    pipeline = Pipeline(model=model, transforms=transforms, horizon=7)

    additional_const_transform = AddConstTransform(in_column="target", value=10)
    ts.transform(transforms=[additional_const_transform])

    autoreg_pipeline.fit(deepcopy(ts))
    pipeline.fit(deepcopy(ts))

    autoreg_forecast = autoreg_pipeline.forecast()
    forecast = pipeline.forecast()

    mean_autoreg = autoreg_forecast.to_pandas(features=["target"]).mean().mean()
    mean = forecast.to_pandas(features=["target"]).mean().mean()

    assert abs(mean_autoreg - mean) < 5

if __name__ == "__main__":
    main()

Environment

No response

Additional context

No response

Checklist