KimMeen / Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
https://arxiv.org/abs/2310.01728
Apache License 2.0
1.29k stars 221 forks source link

Script file request consistent with the paper #81

Closed zhangtianhong-1998 closed 4 months ago

zhangtianhong-1998 commented 4 months ago

I would very much like the author to further provide a script file consistent with the paper. I tried to set the parameters myself and described in the paper, but the test results could not be replicated.

kwuking commented 4 months ago

I would very much like the author to further provide a script file consistent with the paper. I tried to set the parameters myself and described in the paper, but the test results could not be replicated.

Thank you very much for your interest in our work and the effort you have put into replicating our experimental results. We understand the challenges and difficulties that may arise when attempting to replicate the results of deep learning models, especially when it involves using multiple GPUs and advanced optimization techniques such as DeepSpeed's ZeRO.

Our training process utilizes the ZeRO-2 optimizer under the DeepSpeed framework, which is a technology optimized and accelerated for large-scale training. During this process, we use gradient accumulation to simulate larger batch sizes, which helps us manage limited hardware resources but also introduces randomness, as gradient accumulation affects weight updates. Secondly, we have adopted mixed precision training, specifically using bfloat16 (bf16) precision, which can significantly reduce memory usage and speed up training. However, adopting lower precision in floating-point representation also introduces additional numerical computation errors, another source of randomness in results. It's important to note that these techniques are very common and necessary for training large models in modern deep learning frameworks, although they may cause slight fluctuations in results. Additionally, the deep learning runtime environment, framework versions, CUDA versions, and similar factors might introduce a certain degree of randomness.

If you have more specific questions or need assistance, please feel free to contact us by email for further discussion.