The problem of loss reduction in training process

JackKoLing commented 6 months ago

Very good work. Due to the limited equipment, I ran LLAMA in a single Nvidia 3090, and the batch size was set to 4. I only ran 4000 of data from ETTh1, and the input sequence was set to 256 and the output to 48. The learning_rate is 0.001, and other params are followed the script.

During training, I found that the first epoch was the best, and the other losses up end to stop after 10 early stops. I run BERT and GPT2 in a similar situation. (Set 100 epochs, but usually the best is about 10 epochs). The final MAE is 0.6039629. After visualization, it is found that the prediction effect of slightly regular sequences is good, while the prediction effect of less regular sequences is poor.

I don't know if the training process is correct, or if it converges with very few epochs due to the power of LLM in TIME-LLM? Do you have any suggestions？ Thanks.

well0203 commented 6 months ago

hi, I am also interested in this question and would like to read the answers (probably authors have experienced that as well).

dreamerforwuyang commented 5 months ago

你好，可以参照下代码吗，我的一直跑不起来报错，网上也没有解决方案

JackKoLing commented 5 months ago

你好，可以参照下代码吗，我的一直跑不起来报错，网上也没有解决方案

就是作者提供的script脚本，只是我的显卡比较旧，所以配置的环境还是以前的python 3.85而不是3.11。数据我使用的是ETTh1的前2000条，因为改了数据，所以data_loader那里的数据集划分范围需要相应的修改。

dreamerforwuyang commented 5 months ago

感谢回复，我是一名初学者，，我是用的是windows上的pycharm，版本号是3.11.折腾了有一周了，没有实现运行成功，如果可以，可否学习一下您运行的代码，临近毕业延期，感激不尽

@. @.

原始邮件

发件人："JackKoLing"< @.*** >;

发件时间：2024/5/22 10:55

收件人："KimMeen/Time-LLM"< @.*** >;

抄送人："吴洋"< @. >;"Comment"< @. >;

主题：Re: [KimMeen/Time-LLM] The problem of loss reduction in trainingprocess (Issue #84)

你好，可以参照下代码吗，我的一直跑不起来报错，网上也没有解决方案

就是作者提供的script脚本，只是我的显卡比较旧，所以配置的环境还是以前的python 3.85而不是3.11。数据我使用的是ETTh1的前2000条，因为改了数据，所以data_loader那里的数据集划分范围需要相应的修改。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

JackKoLing commented 5 months ago

感谢回复，我是一名初学者，，我是用的是windows上的pycharm，版本号是3.11.折腾了有一周了，没有实现运行成功，如果可以，可否学习一下您运行的代码，临近毕业延期，感激不尽 @. @. 原始邮件发件人："JackKoLing"< @. >; 发件时间：2024/5/22 10:55 收件人："KimMeen/Time-LLM"< @. >; 抄送人："吴洋"< @. >;"Comment"< @. >; 主题：Re: [KimMeen/Time-LLM] The problem of loss reduction in trainingprocess (Issue #84) 你好，可以参照下代码吗，我的一直跑不起来报错，网上也没有解决方案就是作者提供的script脚本，只是我的显卡比较旧，所以配置的环境还是以前的python 3.85而不是3.11。数据我使用的是ETTh1的前2000条，因为改了数据，所以data_loader那里的数据集划分范围需要相应的修改。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

你应该把一些错误提示贴出来，如果有同学刚好遇到就一起讨论。我运行的就是作者的代码，下载好预训练权重就可以跑了，并没有改其他内容，作者的readme和script已经很清晰

dreamerforwuyang commented 5 months ago

谢谢您，因为是初学者，错误都比较基础，问了下GPT也没有很好的解决方案，

@. @.

原始邮件

发件人："JackKoLing"< @.*** >;

发件时间：2024/5/22 11:03

收件人："KimMeen/Time-LLM"< @.*** >;

抄送人："吴洋"< @. >;"Comment"< @. >;

主题：Re: [KimMeen/Time-LLM] The problem of loss reduction in trainingprocess (Issue #84)

感谢回复，我是一名初学者，，我是用的是windows上的pycharm，版本号是3.11.折腾了有一周了，没有实现运行成功，如果可以，可否学习一下您运行的代码，临近毕业延期，感激不尽 @. @. 原始邮件发件人："JackKoLing"< @. >; 发件时间：2024/5/22 10:55 收件人："KimMeen/Time-LLM"< @. >; 抄送人："吴洋"< @. >;"Comment"< @. >; 主题：Re: [KimMeen/Time-LLM] The problem of loss reduction in trainingprocess (Issue #84) 你好，可以参照下代码吗，我的一直跑不起来报错，网上也没有解决方案就是作者提供的script脚本，只是我的显卡比较旧，所以配置的环境还是以前的python 3.85而不是3.11。数据我使用的是ETTh1的前2000条，因为改了数据，所以data_loader那里的数据集划分范围需要相应的修改。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

你应该把一些错误提示贴出来，如果有同学刚好遇到就一起讨论。我运行的就是作者的代码，下载好预训练权重就可以跑了，并没有改其他内容，作者的readme和script已经很清晰

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

dreamerforwuyang commented 5 months ago

您好，很抱歉再次打扰您，代码水平有限，报错还是难以解决，相观摩一下您成功复现的代码（一部分也可以），感激不尽，非常感谢

@. @.

原始邮件

发件人："JackKoLing"< @.*** >;

发件时间：2024/5/22 11:03

收件人："KimMeen/Time-LLM"< @.*** >;

抄送人："吴洋"< @. >;"Comment"< @. >;

主题：Re: [KimMeen/Time-LLM] The problem of loss reduction in trainingprocess (Issue #84)

感谢回复，我是一名初学者，，我是用的是windows上的pycharm，版本号是3.11.折腾了有一周了，没有实现运行成功，如果可以，可否学习一下您运行的代码，临近毕业延期，感激不尽 @. @. 原始邮件发件人："JackKoLing"< @. >; 发件时间：2024/5/22 10:55 收件人："KimMeen/Time-LLM"< @. >; 抄送人："吴洋"< @. >;"Comment"< @. >; 主题：Re: [KimMeen/Time-LLM] The problem of loss reduction in trainingprocess (Issue #84) 你好，可以参照下代码吗，我的一直跑不起来报错，网上也没有解决方案就是作者提供的script脚本，只是我的显卡比较旧，所以配置的环境还是以前的python 3.85而不是3.11。数据我使用的是ETTh1的前2000条，因为改了数据，所以data_loader那里的数据集划分范围需要相应的修改。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

你应该把一些错误提示贴出来，如果有同学刚好遇到就一起讨论。我运行的就是作者的代码，下载好预训练权重就可以跑了，并没有改其他内容，作者的readme和script已经很清晰

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

kwuking commented 5 months ago

Very good work. Due to the limited equipment, I ran LLAMA in a single Nvidia 3090, and the batch size was set to 4. I only ran 4000 of data from ETTh1, and the input sequence was set to 256 and the output to 48. The learning_rate is 0.001, and other params are followed the script.

During training, I found that the first epoch was the best, and the other losses up end to stop after 10 early stops. I run BERT and GPT2 in a similar situation. (Set 100 epochs, but usually the best is about 10 epochs). The final MAE is 0.6039629. After visualization, it is found that the prediction effect of slightly regular sequences is good, while the prediction effect of less regular sequences is poor.

I don't know if the training process is correct, or if it converges with very few epochs due to the power of LLM in TIME-LLM? Do you have any suggestions？ Thanks.

Your observation is correct. Due to the large learning rate set in our demo script, and considering the type of devices and the running environment, it is indeed possible to achieve good convergence with fewer epochs. Given the large parameter space and generalization capabilities of LLMs, this phenomenon is to be expected. We are currently conducting more explorations to better leverage the potential of LLMs for time series data. We greatly appreciate your attention to our work.

I believe that the exploration of LLMs is still in its early stages, particularly due to the lack of theoretical foundations. Many of our findings are often based on empirical results. We are very keen to collaborate with you in exploring LLMs for time series data. We believe that this exploration can effectively advance the time series community.

KimMeen / Time-LLM

The problem of loss reduction in training process #84