-
> [rank0]: Traceback (most recent call last):
> [rank0]: File "/opt/tmp/nlp/wzh/LLM-Dojo/rlhf/rloo_train.py", line 167, in
> [rank0]: trainer.train()
> [rank0]: File "/home/nlp/miniconda3/…
-
run step3 with:
deepspeed --master_port 12346 DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py \
--data_path wangrui6/Zhihu-KOL \
--data_split 2,4,4 \
…
-
I want to save immediate ckpt in training after specfic steps while keep meeting job hang issue, how can I got it fixed?
Torch 1.14 + CUDA 12.0, Transformer Engine 0.6
Code
```
for step, batch in …
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…
-
File "/home/ma-user/anaconda3/envs/dpo/lib/python3.10/site-packages/transformers/configuration_utils.py", line 264, in __getattribute__
return super().__getattribute__(key)
AttributeError: 'In…
-
### Describe the bug
When using the Dataset.to_json() function, an unexpected error occurs if the parameter is set to lines=False. The stored data should be in the form of a list, but it actually tur…
-
-
Hi, very great repo!
May I ask is it possible to release the code based on Jax?
Best
-
To the chatLLaMA team,
Thank you very much for this nice project.
I looked at the model file and saw that the comment of compatiblity with training, so I thought it would be possible to train with …
-
https://arxiv.org/abs/2203.02155