Questions about Finetuning LSF tasks

zqiao11 commented 5 months ago

Hi. I'm working on enhancing long sequence forecasting performance through finetuning. I have successfully replicated the zero-shot learning results shown in Table 22 and will use them as a baseline for comparison.

For a fair comparison, I need to do finetuning under the same train-val-test setup as the zero-shot experiments in Table 22. However, I am unsure if my approach is accurate. Below is a summary of my workflow to finetune Moirai-small on ETTh1 and evaluate it with prediction length of 96:

Following the lsf setup, I split the training data with the same offset as here:

python -m uni2ts.data.builder.simple ETTh1 dataset/ETT-small/ETTh1.csv --offset 8640

Accordingly, I revised the conf/finetune/val_data/etth1.yaml as

_target_: uni2ts.data.builder.ConcatDatasetBuilder
_args_:
  _target_: uni2ts.data.builder.simple.generate_eval_builders
  dataset: ETTh1_eval
  offset: 8640  # Same as _lsf_dataset.py
  eval_length: 2880  # Same as _lsf_dataset.py
  prediction_lengths: [96, 192, 336, 720]
  context_lengths: [1000, 2000, 3000, 4000, 5000]
  patch_sizes: [32, 64]

Then I finetuned a Moirai model with the same command as the example:

python -m cli.finetune \
run_name=my_lsf_run \
model=moirai_1.0_R_small \
data=etth1 \
val_data=etth1

Finally, I changed the ckpt in the model's yaml and evaluated the finetuned model by the 2nd approach in the example:

python -m cli.eval \
run_name=my_lsf_run \
model=moirai_1.0_R_small \
model.patch_size=64 \
model.context_length=5000 \
data=lsf_test \
data.dataset_name=ETTh1 \
data.mode=M \
data.prediction_length=96

Despite following these steps, the finetuning results are underperforming compared to the zero-shot outcomes (MSE is 0.375 and MAE is 0.402 in the original results).

I have a few questions:

Is the workflow above correct? Does it use the same train-val-test split setup of the original experiments?
Given data.mode = M during testing, do I need to build the dataset with wide_multivariate for finetuning?
If the workflow is correct, do you have any suggestions to improve the finetuning performance?

Thank you for your assistance.

gorold commented 5 months ago

I believe it uses the same train/val/test split as the LSF setting. However, it doesn't perform normalization based on train set statistics, which is used in the LSF setting, so there is a mismatch between the fine-tuning and evaluation. If you want to fine-tune in a multivariate fashion, then yes, process it as a multivariate dataset, and also remove the SampleDimension transformation.

zqiao11 commented 5 months ago

Thanks for your reply. Following your suggestions, I normalized the data for fine-tuning, built the data in 'wide_multivariate' and removed the SampleDimension transformation.

However, when I ran the experiment, an error occurred:

...
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/eee/qzz/uni2ts/venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/eee/qzz/uni2ts/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/home/eee/qzz/uni2ts/src/uni2ts/data/loader.py", line 106, in __call__
    assert all(
AssertionError: Sample length must be less than or equal to max_length (512)

I think this error is caused by using dataset built in 'wide_multivariate' mode. How should I handle this issue? Do I need to modify this max_length here, and how to calculate this value?

gorold commented 4 months ago

Hi @zqiao11, sorry for the late response. Have you managed to resolve this issue? If not, could you provide more details?

zqiao11 commented 4 months ago

Hi. I haven't resolved this issue, but I have tracked the reason. This issue can happen when a flatten patchfied sequence exceeds the max_length=512 of Moirai. I think this could be common when processing data built in wide_multivariate.

For example, Etth1 has 7 variates. If I use a context_length=5000, prediction_length=96, and patch_size=64 (same config to reproduce LSF results), then there would be 81 patches for one variate. And after flattening the 7 variates, there are 567 patches ( equals target.size(1)), exceeding the max_seq_length of 512.

The assertion error is raised by the sequence packing function, which is only used in training and not used in forecasting. So that is why one can evaluate the model with mode=M without error.

BTW, is it safe to modify this max_seq_length? Besides sequence packing, I notice it is also used in the codes related to self-attention.

zqiao11 commented 4 months ago

FYI, you can reproduce this issue by running your example codes of finetuning. Just build the Etth1 dataset with wide_multivariate, and set context_length=5000, prediction_length=96, and patch_size=64.

thisthq commented 4 months ago

@zqiao11 Have you resolved this issue? I'm experiencing the same situation as you.

wyhzunzun123123 commented 4 months ago

@zqiao11 Hello, I also finetuned the model in the ETTh1 and its performance decreased significantly. Have you solve this issue after normalizing the ETTH1.

zqiao11 commented 4 months ago

@wyhzunzun123123 Hi, I haven't solved this issue for ETTh1. Since the config for reproduction uses mode='M' in ETTh1, I think one may need to finetune it with dataset built in the multi-variate time series format. But I cannot handle the error caused by 'max_seq_len' and need to wait for the author's reply.

You may consider to finetune the model with ETTm1 dataset, which evals in mode='S' (build the dataset in 'wide').

gorold commented 4 months ago

So sorry for the delayed response, for the max_seq_len issue, you can use one of the following options:

increase the max_seq_len parameter
use a shorter context length
add in the SampleDimension feature with the max_dim parameter set appropriately.

The idea is that we set a maximum number of tokens, max_seq_len. This is calculated by (context_len + prediction_len) / patch_size * dim.

gorold commented 4 months ago

Regarding a difference in performance with ETTh1, if you want to evaluate on the LSTF setting, you will have to perform a normalization on the train set statistics first.

zqiao11 commented 4 months ago

Thanks. Can you briefly explain the role of SampleDimension feature? Does it sample as many dimensions/variates as possible from an MTS with a given limit of max_seq_len?

gorold commented 3 months ago

It subsamples the variates given the max_dim parameter. max_seq_len is not given to SampleDimension as a parameter.

DongChen06 commented 3 months ago

@zqiao11 Hi, have you solved the max_seq_len issues, any experiences with this error? Thank you so much!

SalesforceAIResearch / uni2ts

Questions about Finetuning LSF tasks #31