-
Hi! I tried to do lora with llama2 model. However, when I use peft==0.9.0, loss is always NAN.0, when I use peft==0.3.0, loss is normal. I'm curious if there are significant differences in Lora betwee…
-
Hello! Have a problem with starting PPO_finetuning example with only 1 machine and 1 GPU.
But succesfully started examples in https://github.com/flowersteam/Grounding_LLMs_with_online_RL with lamorel…
-
Hi, I wonder about how to load a trained T5 policy? And then fine-tune it on unseen tasks. I find the parameters of llm_module in PPOUpdater but I can't find how to load_state_dict() to _ llm_module. …
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
twwch updated
7 months ago
-
**Describe the bug**
Hi, everybody, I'm traning a llama model in step3 using deepspeed-chat. In version 0.10.1, it raised the following error([see in logs bleow](https://github.com/microsoft/DeepSp…
-
-
微博内容精选
-
### Model/Pipeline/Scheduler description
The amazing @dome272 published a new diffusion model, called Wuerstchen: https://github.com/dome272/Wuerstchen
I think it would be a very nice addition t…
-
https://huggingface.co/blog/rlhf
### Background
In the section on the third step of the process, it is written:
- What multiple organizations seem to have gotten to work is **fine-tuning some…
-
Hello, I am currently trying to reproduce the evaluation results in the [original StackLLaMA blog post](https://huggingface.co/blog/stackllama) after fine-tuning LLaMA with PPO, but am struggling to d…