-
Hi, I also met the same issue when I `bash run_finetune_with_lora.sh` with the `LLAMA-7b`. The following is my script and log:
```
#!/bin/bash
# Please run this script under ${project…
-
### 🐛 Describe the bug
Please correct me if I'm wrong, but it looks like SFT for Anthropic simply maximizes log p(x) on the entire dialogue history, rather than only maximizing log p(y|x), where x is…
-
Dear Rob,
After succesfully Loading checkpoint shards: 100%, my computer return this:
Traceback (most recent call last):
File "C:\Users\erwinnella\Desktop\h2ogpt\generate.py", line 16, in
…
-
**Describe the bug**
Hi, everybody, I'm traning a llama model in step3 using deepspeed-chat. In version 0.10.1, it raised the following error([see in logs bleow](https://github.com/microsoft/DeepSp…
-
## **Overview**
dMeter aims to revolutionize the way decentralized Measurement, Reporting, and Verification (dMRV) protocols operate in the field of regenerative action and ecosystem regeneration. …
-
Hello, thanks for this really cool repository. I'm recently learning about `pjit` and your repo is a valuable reference resource.
I was having some issues running the example code. In particular, i…
-
I've recently discovered that reward modeling plays a crucial role in the third step of PPO training. In my previous reward model, the score would increase with more "thanks" tokens, regardless of t…
-
after run supervised_finetuning.py:
I got:
```
(gh_trl) amd00@MZ32-00:~/llm_dev/trl$ ll llama-se/final_checkpoint/
total 32828
drwxrwxr-x 2 amd00 amd00 4096 7月 9 11:13 ./
drwxrwxr-x 3 am…
-
- [ ] scoring class for alert (based on AlertReward)
- [ ] ~~scoring class for counting alert~~
- [x] add the possibility to return the number of "accurate simulator" used in the scoring (see the ex…
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…