-
# Dataset
1. Refactor the self cognition dataset to support multi-lingual QAs.
# Megatron PreTrain
1. Support more Megatron models
2. Support dataset split
# Fine-tuning
1. RAG LLM training …
-
## New items
- [ ] Tutorial of using Diffgram (workflow) to train custom LLM with 3rd party training tool (or open source)
## Past context
details in internal slack discussion - creating ticket a…
-
Hi, all
I see no benefit from the CLVP module, the best score AR generated mel code may not so good, even with some timbre mixture, Should we put the speaker cond into the text tokens during tr…
-
I am getting the following error when doing RLHF training. I decreased the max_sequence_length in my actor configuration to 1024 because there were errors with training for me when set to 2048. Is my …
-
Hi, thanks for uploading the code for pair_pm! Since in the blog, it seems that you are using SLiC for pair_pm models. In the directory of pair_pm, I can't find the code for using slic methods.
-
我的运行脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed /data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py --data_path /data/bill.bi/RLHFDataset --data_output_path /…
-
I'm trying to figure out how to retrieve user feedback submitted via the thumbs up/thumbs down interface in the Web UI. Specifically, I need to know how to access this feedback data through the pipeli…
-
```
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/nfs04/chengkz/VL-RLHF/src/vlrlhf/dpo.py", line 146, in
[rank1]: dpo_trainer.train(resume_from_checkpoint=training_args.re…
-
Increase the training iterations: Train the PPO model for more iterations, as the model might not have converged yet.
Adjust the PPO hyperparameters: Experiment with different hyperparameters such …
-
In which Training step do you use HH-RLHF and SHP datasets?
Thanks for your help.