-
安装版本:
![image](https://github.com/user-attachments/assets/9f1c8790-11e2-4452-ad4c-382f72d87cfb)
![image](https://github.com/user-attachments/assets/13d5c64a-e45b-4e82-a6cc-cd2beb709e92)
问题描述:
…
-
Bad documenttaion. not very long errors
Detecting toxicity in outputs generated by Large Language Models (LLMs) is crucial for ensuring that these models produce safe, respectful, and appropriate con…
-
Bad documenttaion. not very long errors
Detecting toxicity in outputs generated by Large Language Models (LLMs) is crucial for ensuring that these models produce safe, respectful, and appropriate con…
-
when initializing reward and ref models in step 3 of deepspeed-chat, there are two kinds of deepspeed config files are used, i.e. ds_config and ds_eval_config. May I ask why we need to use two configs…
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…
-
Describe the bug
I am trying to finetune tiiuae/falcon-7b-instruct and I am getting this error.
`TypeError: where(): argument 'condition' (position 1) must be Tensor, not bool`
**To Reproduce**…
-
When running step 3 with ZERO stage 3 enabled for both the actor and critic models,
I get the following error (line numbers may be offset due to debug statements I've added):
```
File "/path/DeepSp…
-
### SFT data
1. Started the SFT stage with publicly available instruction tuning data ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416))
2. Fewer but high quality > Millions of data but low …
-
### Bug Report
I have tried to reproduce the results on my own using Llama 3.1 8b.
I can successfully run the SFT and Reward models trainers. But, the cost model trainer consistently crashes.
…
cemiu updated
2 months ago
-
This issue serves for informing about and discussing the next major release of Tianshou, after which the library can be considered mature and stable from our perspective. The progress and the related …