FEAT: Add finetune_depth parameter

marcopeix commented 2 months ago

Add the finetune_depth parameter to control how many layers are finetuned. Adjust tutorials and capabilities with new parameter.

review-notebook-app[bot] commented 2 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions[bot] commented 2 months ago

Experiment Results

## Experiment 1: air-passengers ### Description: | variable | experiment | |:--------------|:-------------| | h | 12 | | season_length | 12 | | freq | MS | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|-----------:| | mae | 12.6793 | 11.0623 | 47.8333 | 76 | | mape | 0.027 | 0.0232 | 0.0999 | 0.1425 | | mse | 213.936 | 199.132 | 2571.33 | 10604.2 | | total_time | 1.8765 | 1.8137 | 0.0055 | 0.004 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_air-passengers_12_12_MS_None_1.png?raw=true) ## Experiment 2: air-passengers ### Description: | variable | experiment | |:--------------|:-------------| | h | 24 | | season_length | 12 | | freq | MS | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|-----------:| | mae | 58.1031 | 58.4587 | 71.25 | 115.25 | | mape | 0.1257 | 0.1267 | 0.1552 | 0.2358 | | mse | 4040.21 | 4110.79 | 5928.17 | 18859.2 | | total_time | 0.5784 | 1.0226 | 0.0045 | 0.004 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_air-passengers_24_12_MS_None_1.png?raw=true) ## Experiment 3: electricity-multiple-series ### Description: | variable | experiment | |:--------------|:-------------| | h | 24 | | season_length | 24 | | freq | H | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|---------------:| | mae | 178.293 | 268.121 | 269.23 | 1331.02 | | mape | 0.0234 | 0.0311 | 0.0304 | 0.1692 | | mse | 121588 | 219457 | 213677 | 4.68961e+06 | | total_time | 0.5359 | 3.2004 | 0.0055 | 0.0051 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_electricity-multiple-series_24_24_H_None_1.png?raw=true) ## Experiment 4: electricity-multiple-series ### Description: | variable | experiment | |:--------------|:-------------| | h | 168 | | season_length | 24 | | freq | H | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|---------------:| | mae | 465.532 | 346.984 | 398.956 | 1119.26 | | mape | 0.062 | 0.0437 | 0.0512 | 0.1583 | | mse | 835120 | 403787 | 656723 | 3.17316e+06 | | total_time | 0.5502 | 1.1355 | 0.0059 | 0.0053 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_electricity-multiple-series_168_24_H_None_1.png?raw=true) ## Experiment 5: electricity-multiple-series ### Description: | variable | experiment | |:--------------|:-------------| | h | 336 | | season_length | 24 | | freq | H | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|--------------:|-------------------------:|----------------:|---------------:| | mae | 558.649 | 459.769 | 602.926 | 1340.95 | | mape | 0.0697 | 0.0566 | 0.0787 | 0.17 | | mse | 1.22721e+06 | 739135 | 1.61572e+06 | 6.04619e+06 | | total_time | 0.6212 | 0.7403 | 0.006 | 0.0053 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_electricity-multiple-series_336_24_H_None_1.png?raw=true)

elephaint commented 1 month ago

Please also add a test verifying that as the finetune depth increases the loss is lower in the nbs/docs/tutorials/06_finetuning.ipynb notebook.

Such a test doesn't always work; i.e. it's not always the case that finetuning improves the model. So I'm removing it again....

jmoralez commented 1 month ago

What changed between now and 7bc1b5d1e554b1ad8fc7c420bcad1b87151b9048? That one has very different results (nb link). This is supposed to be deterministic, isn't it? I'd expect to be able to reproduce the metrics from that commit every time, especially the monotonic part, right now 2 and 3 yield the same result which is highly suspicious.

elephaint commented 1 month ago

What changed between now and 7bc1b5d? That one has very different results (nb link). This is supposed to be deterministic, isn't it? I'd expect to be able to reproduce the metrics from that commit every time, especially the monotonic part, right now 2 and 3 yield the same result which is highly suspicious.

Nothing really changed, the issue is that it doesn't hold in general that:

Finetune depth higher -> better performence
Finetune steps more -> better performance

I tweaked the parameters so that the results are not good but monotonic (however as said before, finetuning isn't guaranteed to provide results that are strictly better when increasing the parameters)

jmoralez commented 1 month ago

Thanks! So the test would pass now? It'd be great having loss_depth1 > loss_depth2 > loss_depth3 to detect possible regressions or the parameter not being passed through correctly.

elephaint commented 1 month ago

Thanks! So the test would pass now? It'd be great having loss_depth1 > loss_depth2 > loss_depth3 to detect possible regressions or the parameter not being passed through correctly.

No, because:

Finetune depth higher does not lead to better performance in general
Finetune steps higher does not lead to better performance in general

so we shouldn't market that view, either. And a test on that is useless too; if it fails, the results might still be better than before.

I've updated the example to explain that also (so users see that increasing depth can also worsen performance, and it's a bit of trial and error).

The tutorial fails if the parameter isn't passed through correctly, so we're covered there anyways.

Nixtla / nixtla

FEAT: Add finetune_depth parameter #471