amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
https://arxiv.org/abs/2403.07815
Apache License 2.0
2.02k stars 238 forks source link

FineTuning input dimensions for clarity #105

Closed CoCoNuTeK closed 2 weeks ago

CoCoNuTeK commented 2 weeks ago

Hello there, so just for me and the others to avoid wrong data formatting into the finetuning script what should be the dimensions when serializing into the

    dataset = [
        {"start": start, "target": ts} for ts, start in zip(time_series, start_times)
    ]
    ArrowWriter(compression=compression).write_to_file(
        dataset,
        path=path,
    )

so if i use contextlen=512 and pred_len=64 with numtimeseries=100; the 'start' variable should be array and have len of 100 where each element is datetime64 type and is telling us what is the starting point of the ith sequence but its corresponding to the ith sequence in the ts array, where each element in the ts array should be array of length 'contextlen' + 'pred_len' ?

my additional question would be is there a way to setup the timestep so the model knows the timegaps between datapoints if its 10min one tick or 5min

CoCoNuTeK commented 2 weeks ago

Okay so it might be possible the timestep setup in the train.py changing the freq param

    train_datasets = [
        Filter(
            partial(
                has_enough_observations,
                min_length=min_past + prediction_length,
                max_missing_prop=max_missing_prop,
            ),
            FileDataset(path=Path(data_path), freq=frequency),
        )
        for data_path in training_data_paths
    ]
lostella commented 2 weeks ago

@CoCoNuTeK I'm not sure I get your question. The dataset is a list of dictionaries: each dictionary has a "start" attribute (of type np.datetime64) and a "target" attribute (a np.ndarray object with a single axis).

my additional question would be is there a way to setup the timestep so the model knows the timegaps between datapoints if its 10min one tick or 5min

You can store it in some attribute, but it will not be used anywhere by the model, neither at training nor prediction time. What you see here is just putting some fake frequency information to be able to use gluonts' FileDataset class, but it's not used.