KimMeen / Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
https://arxiv.org/abs/2310.01728
Apache License 2.0
1.46k stars 252 forks source link

How to select llm? #125

Closed jexterliangsufe closed 3 months ago

jexterliangsufe commented 4 months ago

Great works! I'm confused about which llm can get best mae. I tried gpt2, llama, llama2, llama3 and qwen2. Limited by gpu resources, I have to reduce batch size to 4 when llama family models are utilized and get worse mae than gpt2. Referenced the open-source LLM leaderboard from huggingface, I try Qwen2-7b and get worst mae. So, is there any criteria for selecting llms?

kwuking commented 3 months ago

I'm confused about which llm can get best mae. I tried gpt2, llama, llama2, llama3 and qwen2. Limited by gpu resources, I have to reduce batch size to 4 when llama family models are utilized and get worse mae than gpt2. Referenced the open-source LLM leaderboard from huggingface, I try Qwen2-7b and get worst mae.

Thank you for your constructive questions. Your inquiry about how to choose an LLM is indeed an excellent one. Currently, we have mainly tested Llama, GPT-2, and BERT. Our framework is compatible with other LLMs as well, but training and fine-tuning LLMs is indeed a very challenging task. It may require meticulous handling and a significant amount of resources. We are considering providing pre-trained models for everyone to use, but due to current resource limitations, it poses certain challenges. In any case, you have raised a very good question. I believe that for different LLMs, when the model structures are similar, the model size may have a greater impact on the results. Achieving good results could require better fine-tuning of the LLM.