Nixtla / nixtla

TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection. Generative pretrained transformer for time series trained on over 100B data points. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code 🚀.
https://docs.nixtla.io
Other
2.31k stars 187 forks source link

FEAT: Add finetune_depth parameter #471

Closed marcopeix closed 1 month ago

marcopeix commented 2 months ago

Add the finetune_depth parameter to control how many layers are finetuned. Adjust tutorials and capabilities with new parameter.

review-notebook-app[bot] commented 2 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

github-actions[bot] commented 2 months ago
Experiment Results ## Experiment 1: air-passengers ### Description: | variable | experiment | |:--------------|:-------------| | h | 12 | | season_length | 12 | | freq | MS | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|-----------:| | mae | 12.6793 | 11.0623 | 47.8333 | 76 | | mape | 0.027 | 0.0232 | 0.0999 | 0.1425 | | mse | 213.936 | 199.132 | 2571.33 | 10604.2 | | total_time | 1.8765 | 1.8137 | 0.0055 | 0.004 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_air-passengers_12_12_MS_None_1.png?raw=true) ## Experiment 2: air-passengers ### Description: | variable | experiment | |:--------------|:-------------| | h | 24 | | season_length | 12 | | freq | MS | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|-----------:| | mae | 58.1031 | 58.4587 | 71.25 | 115.25 | | mape | 0.1257 | 0.1267 | 0.1552 | 0.2358 | | mse | 4040.21 | 4110.79 | 5928.17 | 18859.2 | | total_time | 0.5784 | 1.0226 | 0.0045 | 0.004 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_air-passengers_24_12_MS_None_1.png?raw=true) ## Experiment 3: electricity-multiple-series ### Description: | variable | experiment | |:--------------|:-------------| | h | 24 | | season_length | 24 | | freq | H | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|---------------:| | mae | 178.293 | 268.121 | 269.23 | 1331.02 | | mape | 0.0234 | 0.0311 | 0.0304 | 0.1692 | | mse | 121588 | 219457 | 213677 | 4.68961e+06 | | total_time | 0.5359 | 3.2004 | 0.0055 | 0.0051 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_electricity-multiple-series_24_24_H_None_1.png?raw=true) ## Experiment 4: electricity-multiple-series ### Description: | variable | experiment | |:--------------|:-------------| | h | 168 | | season_length | 24 | | freq | H | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|------------:|-------------------------:|----------------:|---------------:| | mae | 465.532 | 346.984 | 398.956 | 1119.26 | | mape | 0.062 | 0.0437 | 0.0512 | 0.1583 | | mse | 835120 | 403787 | 656723 | 3.17316e+06 | | total_time | 0.5502 | 1.1355 | 0.0059 | 0.0053 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_electricity-multiple-series_168_24_H_None_1.png?raw=true) ## Experiment 5: electricity-multiple-series ### Description: | variable | experiment | |:--------------|:-------------| | h | 336 | | season_length | 24 | | freq | H | | level | None | | n_windows | 1 | ### Results: | metric | timegpt-1 | timegpt-1-long-horizon | SeasonalNaive | Naive | |:-----------|--------------:|-------------------------:|----------------:|---------------:| | mae | 558.649 | 459.769 | 602.926 | 1340.95 | | mape | 0.0697 | 0.0566 | 0.0787 | 0.17 | | mse | 1.22721e+06 | 739135 | 1.61572e+06 | 6.04619e+06 | | total_time | 0.6212 | 0.7403 | 0.006 | 0.0053 | ### Plot: ![](https://github.com/Nixtla/nixtla/blob/docs-figs-model-performance//action_files/models_performance/plots/plot_electricity-multiple-series_336_24_H_None_1.png?raw=true)
elephaint commented 1 month ago

Please also add a test verifying that as the finetune depth increases the loss is lower in the nbs/docs/tutorials/06_finetuning.ipynb notebook.

Such a test doesn't always work; i.e. it's not always the case that finetuning improves the model. So I'm removing it again....

jmoralez commented 1 month ago

What changed between now and 7bc1b5d1e554b1ad8fc7c420bcad1b87151b9048? That one has very different results (nb link). This is supposed to be deterministic, isn't it? I'd expect to be able to reproduce the metrics from that commit every time, especially the monotonic part, right now 2 and 3 yield the same result which is highly suspicious. image image

elephaint commented 1 month ago

What changed between now and 7bc1b5d? That one has very different results (nb link). This is supposed to be deterministic, isn't it? I'd expect to be able to reproduce the metrics from that commit every time, especially the monotonic part, right now 2 and 3 yield the same result which is highly suspicious. image image

Nothing really changed, the issue is that it doesn't hold in general that:

I tweaked the parameters so that the results are not good but monotonic (however as said before, finetuning isn't guaranteed to provide results that are strictly better when increasing the parameters)

jmoralez commented 1 month ago

Thanks! So the test would pass now? It'd be great having loss_depth1 > loss_depth2 > loss_depth3 to detect possible regressions or the parameter not being passed through correctly.

elephaint commented 1 month ago

Thanks! So the test would pass now? It'd be great having loss_depth1 > loss_depth2 > loss_depth3 to detect possible regressions or the parameter not being passed through correctly.

No, because:

so we shouldn't market that view, either. And a test on that is useless too; if it fails, the results might still be better than before.

I've updated the example to explain that also (so users see that increasing depth can also worsen performance, and it's a bit of trial and error).

The tutorial fails if the parameter isn't passed through correctly, so we're covered there anyways.