CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.51k stars 470 forks source link

PPO training stucks #319

Closed Jiaxin-Wen closed 1 year ago

Jiaxin-Wen commented 1 year ago

๐Ÿ› Describe the bug

I am running example/summarize_rlhf.
I have successfully run the code when the make_experience function was in orchestrator/ppo_orchestrator a few days ago. However, after syncing with the latest version (main branch), I find that the PPO training hangs and raise the following timeout error:

[rollout 0 / 128]:   0%|                                                                                                   
          | 0/128 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using
 the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a pad
ded encoding.                                                                                                              
Using /home/wenjiaxin/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...                                     
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0006618499755859375 seconds
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster t
han using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Using /home/wenjiaxin/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...                                     
No modifications detected for re-loaded extension module utils, skipping build step...                                     
Loading extension module utils...
Time to load utils op: 0.0005791187286376953 seconds
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster t
han using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Using /home/wenjiaxin/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0007822513580322266 seconds
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster t
han using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/data/wenjiaxin/home/trlx/trlx/trainer/accelerate_ppo_trainer.py:314: UserWarning: To copy construct from a tensor, it is r
ecommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.te
nsor(sourceTensor).
  all_scores = torch.tensor(
[rollout 80 / 128]:  62%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰
[rollout 96 / 128]:  62%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰
[rollout 96 / 128]:  75%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Ž
[rollout 112 / 128]:  75%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ
[rollout 112 / 128]:  88%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰
[rollout 128 / 128]:  88%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‰
[rollout 128 / 128]: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
[rollout 128 / 128]: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
| 128/128 [06:23<00:00,  3.00s/it]
[RANK 0] Starting training
[RANK 0] Evaluating model
[generation sweep 1/1 | eval batch 2/2]: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:25<00:00, 12.98s/it]
[RANK 0] Computing rewards
/data/wenjiaxin/home/trlx/trlx/trainer/accelerate_base_trainer.py:364: UserWarning: To copy construct from a tensor, it is
recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.t
ensor(sourceTensor).
  rewards = torch.tensor(
[RANK 0] Summarizing evaluation                                                                                            
                                             Evaluation #0 reward/mean: 0.105                                              
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ prompt                                                โ”ƒ output                                                 โ”ƒ reward โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ SUBREDDIT: r/AskReddit                                โ”‚  I have feelings for someone else, and I want them to  โ”‚ -0.77  โ”‚
โ”‚ TITLE: How do you get someone out of your head?       โ”‚ go away. I don't know how to do it.                    โ”‚        โ”‚
โ”‚ POST: Hi,                                             โ”‚                                                        โ”‚        โ”‚
โ”‚ I'm 22, and I have been with my girlfriend for 5      โ”‚                                                        โ”‚        โ”‚
โ”‚ years now. We recently moved together. We've always   โ”‚                                                        โ”‚        โ”‚
โ”‚ loved each other intensely.                           โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ Problem, I recently started to have feelings for an   โ”‚                                                        โ”‚        โ”‚
โ”‚ other person (a friend). This person has had a        โ”‚                                                        โ”‚        โ”‚
โ”‚ boyfriend for now 3 years, and has absolutely no      โ”‚                                                        โ”‚        โ”‚
โ”‚ ideas. Those feelings were so strong, it was hard to  โ”‚                                                        โ”‚        โ”‚
โ”‚ hide them. After 2 months of me being distant and     โ”‚                                                        โ”‚        โ”‚
โ”‚ really sad, my girlfriend forced me to say what was   โ”‚                                                        โ”‚        โ”‚
โ”‚ bothering me. I'm not a good liar, and now she knows. โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ We decided to give us a week alone, I went to my      โ”‚                                                        โ”‚        โ”‚
โ”‚ parents.                                              โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ Now, I'm completely lost. I keep on thinking about    โ”‚                                                        โ”‚        โ”‚
โ”‚ this person, and I hate that. I would like for those  โ”‚                                                        โ”‚        โ”‚
โ”‚ feelings to go away, to leave me alone. But I can't.  โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ What do I do? It's been 3 months now, and I'm just    โ”‚                                                        โ”‚        โ”‚
โ”‚ desperate.                                            โ”‚                                                        โ”‚        โ”‚
โ”‚ TL;DR:                                                โ”‚                                                        โ”‚        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ SUBREDDIT: r/pettyrevenge                             โ”‚  My mom woke me up with loud TV, I turned my speakers  โ”‚ 0.55   โ”‚
โ”‚ TITLE: So, my mom woke me up with a loud TV.          โ”‚ up really loud and blasted Gangnam Style on repeat,    โ”‚        โ”‚
โ”‚ POST: She was in her living room, watching TV. This   โ”‚ making a lot of noise.                                 โ”‚        โ”‚
โ”‚ was at about 8:30 in the morning, and she was         โ”‚                                                        โ”‚        โ”‚
โ”‚ exercising. She turned the TV up extra loud to hear   โ”‚                                                        โ”‚        โ”‚
โ”‚ it over her excercycle, and woke me up. I went in     โ”‚                                                        โ”‚        โ”‚
โ”‚ there asking for her to turn it down. She said she    โ”‚                                                        โ”‚        โ”‚
โ”‚ didn't have to; I explained that I always used        โ”‚                                                        โ”‚        โ”‚
โ”‚ headphones so she didn't have to deal with my noise   โ”‚                                                        โ”‚        โ”‚
โ”‚ and that she should give me a little more respect,    โ”‚                                                        โ”‚        โ”‚
โ”‚ given that I paid rent at the time.                   โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ She disagreed. I went back to my room, rather pissed  โ”‚                                                        โ”‚        โ”‚
โ”‚ off at the lack of equality. I had no lock on my      โ”‚                                                        โ”‚        โ”‚
โ”‚ door; but I had a dresser right next to it, so I      โ”‚                                                        โ”‚        โ”‚
โ”‚ pulled one of the drawers out enough so that it       โ”‚                                                        โ”‚        โ”‚
โ”‚ caused the door to not be openable. Then, I turned my โ”‚                                                        โ”‚        โ”‚
โ”‚ speakers up really loud and blasted Gangnam Style on  โ”‚                                                        โ”‚        โ”‚
โ”‚ repeat, with the bass cranked up as high as it could  โ”‚                                                        โ”‚        โ”‚
โ”‚ go.                                                   โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ If you hate Gangnam Style for being overplayed, you   โ”‚                                                        โ”‚        โ”‚
โ”‚ will see why I chose that particular song. I          โ”‚                                                        โ”‚        โ”‚
โ”‚ personally don't mind it. But here's the thing about  โ”‚                                                        โ”‚        โ”‚
โ”‚ my bass; it vibrates the walls, making one hell of a  โ”‚                                                        โ”‚        โ”‚
โ”‚ lot of noise. Needless to say, my mom was not pleased โ”‚                                                        โ”‚        โ”‚
โ”‚ and shut off the internet. But it was oh so worth it. โ”‚                                                        โ”‚        โ”‚
โ”‚ TL;DR:                                                โ”‚                                                        โ”‚        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ SUBREDDIT: r/relationships                            โ”‚  Girlfriend cheated on me by kissing two guys at a     โ”‚ -0.104 โ”‚
โ”‚ TITLE: My girlfriend (20f) of two years cheated on me โ”‚ party. We both want to fix things but I'm not sure if  โ”‚        โ”‚
โ”‚ (20m) by kissing two guys at a Halloween party.       โ”‚ I should.                                              โ”‚        โ”‚
โ”‚ POST: Lately her and I have been having a few         โ”‚                                                        โ”‚        โ”‚
โ”‚ problems, and these problems have been brought up     โ”‚                                                        โ”‚        โ”‚
โ”‚ before a few times. One problem being that I don't    โ”‚                                                        โ”‚        โ”‚
โ”‚ show enough affection. I don't tell her she's pretty  โ”‚                                                        โ”‚        โ”‚
โ”‚ very often or don't compliment her much. I feel       โ”‚                                                        โ”‚        โ”‚
โ”‚ terrible about it, but this time I was really trying  โ”‚                                                        โ”‚        โ”‚
โ”‚ to change for her.                                    โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ For Halloween she went to visit her step brother at a โ”‚                                                        โ”‚        โ”‚
โ”‚ college and I got drunk with my friends and watched   โ”‚                                                        โ”‚        โ”‚
โ”‚ movies. Last night (11/1) we got in a huge fight      โ”‚                                                        โ”‚        โ”‚
โ”‚ about me not changing and how our relationship won't  โ”‚                                                        โ”‚        โ”‚
โ”‚ work out and basically broke up over the phone. So in โ”‚                                                        โ”‚        โ”‚
โ”‚ an effort to try and fix it I drove to her house. She โ”‚                                                        โ”‚        โ”‚
โ”‚ told me how at the parties she went to that two guys  โ”‚                                                        โ”‚        โ”‚
โ”‚ kissed her. The first one she pushed away, but the    โ”‚                                                        โ”‚        โ”‚
โ”‚ second one I asked her if she kissed him back and she โ”‚                                                        โ”‚        โ”‚
โ”‚ said yes and that she did it because it made her feel โ”‚                                                        โ”‚        โ”‚
โ”‚ wanted, which I guess I haven't been making her feel  โ”‚                                                        โ”‚        โ”‚
โ”‚ that way lately. We cried, we talked about            โ”‚                                                        โ”‚        โ”‚
โ”‚ everything, we had great sex, and I stayed over at    โ”‚                                                        โ”‚        โ”‚
โ”‚ her house just to sleep with her and then snuck out   โ”‚                                                        โ”‚        โ”‚
โ”‚ in the morning so her parents wouldn't know.          โ”‚                                                        โ”‚        โ”‚
โ”‚                                                       โ”‚                                                        โ”‚        โ”‚
โ”‚ We both obviously want to work things out but aren't  โ”‚                                                        โ”‚        โ”‚
โ”‚ sure if we should. I love this girl, but the more I   โ”‚                                                        โ”‚        โ”‚
โ”‚ think about it, all I can think about is her cheating โ”‚                                                        โ”‚        โ”‚
โ”‚ on me, and more importantly, liking it. It makes me   โ”‚                                                        โ”‚        โ”‚
โ”‚ sick to my stomach. Should I even try to fix it or    โ”‚                                                        โ”‚        โ”‚
โ”‚ would I be better off cutting all ties.               โ”‚                                                        โ”‚        โ”‚
โ”‚ TL;DR:                                                โ”‚                                                        โ”‚        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0%|                                                                                             | 0/3200 [00:00<?, ?it/s]
[2023-02-21 21:58:49,898] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 65536, reducing to 65536                                                                                              
[losses/total_loss: -0.31 | losses/policy_loss: -0.33 | losses/value_loss: 0.08]:   0%| | 1/3200 [00:01<1:21:59,  1.54s/it]
[2023-02-21 21:58:51,427] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 65536, reducing to 32768.0
[losses/total_loss: 0.14 | losses/policy_loss: 0.13 | losses/value_loss: 0.06]:   2%|  | 64/3200 [06:50<5:38:48,  6.48s/it]
[RANK 0] Collecting rollouts
[rollout 128 / 128]: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 128/128 [07:03<00:00,  3.30s/it]
[2023-02-21 22:12:43,999] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 32768.0, reducing to 16384.0
[losses/total_loss: -0.28 | losses/policy_loss: -0.33 | losses/value_loss: 0.23]:   2%| | 65/3200 [13:55<114:53:35, 131.93s
[2023-02-21 22:12:45,560] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 16384.0, reducing to 8192.0
[losses/total_loss: -0.28 | losses/policy_loss: -0.33 | losses/value_loss: 0.23]:   2%| | 66/3200 [13:57<80:48:25, 92.82s/i
[2023-02-21 22:12:47,125] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 8192.0, reducing to 4096.0
[losses/total_loss: -0.28 | losses/policy_loss: -0.33 | losses/value_loss: 0.23]:   2%| | 67/3200 [13:58<56:57:20, 65.45s/i
[2023-02-21 22:12:48,690] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 4096.0, reducing to 2048.0
[losses/total_loss: 0.22 | losses/policy_loss: 0.13 | losses/value_loss: 0.42]:   2%|  | 76/3200 [14:53<7:36:46,  8.77s/it]
[2023-02-21 22:13:43,188] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 2048.0, reducing to 1024.0
[losses/total_loss: 0.80 | losses/policy_loss: 0.73 | losses/value_loss: 0.34]:   2%|  | 77/3200 [14:54<5:44:06,  6.61s/it]
[2023-02-21 22:13:44,752] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 1024.0, reducing to 512.0
[losses/total_loss: 0.80 | losses/policy_loss: 0.73 | losses/value_loss: 0.34]:   2%|  | 78/3200 [14:56<4:25:12,  5.10s/it]
[2023-02-21 22:13:46,317] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 512.0, reducing to 256.0
[losses/total_loss: 0.80 | losses/policy_loss: 0.73 | losses/value_loss: 0.34]:   2%|  | 79/3200 [14:57<3:30:00,  4.04s/it]
[2023-02-21 22:13:47,882] [INFO] [stage_1_and_2.py:1762:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss sc
ale: 256.0, reducing to 128.0
[losses/total_loss: -0.27 | losses/policy_loss: -0.30 | losses/value_loss: 0.18]:   4%| | 128/3200 [20:11<5:32:27,  6.49s/i
[RANK 0] Collecting rollouts
[rollout 128 / 128]: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 128/128 [06:48<00:00,  3.19s/it]
[losses/total_loss: 0.38 | losses/policy_loss: 0.18 | losses/value_loss: 1.02]:   6%| | 192/3200 [33:59<5:29:37,  6.58s/it]
[RANK 0] Collecting rollouts
[rollout 134 / 128]: : 134it [08:45,  3.92s/it]
[E ProcessGroupNCCL.cpp:821] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=696, OpType=ALLREDUCE, 
Timeout(ms)=1800000) ran for 1802485 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:821] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=696, OpType=BROADCAST,
Timeout(ms)=1800000) ran for 1802763 milliseconds before timing out.
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /data/wenjiaxin/home/trlx/trlx/trainer/accelerate_ppo_trainer.py:277 in make_experience          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   274 โ”‚   โ”‚   โ”‚   # TOOD (jon-tow): Make `prompt_dataloader` a cyclic/infinite DataLoader to n   โ”‚
โ”‚   275 โ”‚   โ”‚   โ”‚   # "refreshing" the contents of the `prompt_iterator`                           โ”‚
โ”‚   276 โ”‚   โ”‚   โ”‚   try:                                                                           โ”‚
โ”‚ โฑ 277 โ”‚   โ”‚   โ”‚   โ”‚   batch: PromptBatch = next(self.prompt_iterator)                            โ”‚
โ”‚   278 โ”‚   โ”‚   โ”‚   except StopIteration:                                                          โ”‚
โ”‚   279 โ”‚   โ”‚   โ”‚   โ”‚   self.prompt_iterator = iter(self.prompt_dataloader)                        โ”‚
โ”‚   280 โ”‚   โ”‚   โ”‚   โ”‚   batch = next(self.prompt_iterator)                                         โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
StopIteration

During handling of the above exception, another exception occurred:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /data/wenjiaxin/home/trlx/examples/summarize_rlhf/trlx_gptj_text_summarization.py:142 in         โ”‚
โ”‚ <module>                                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   139 โ”‚   #         import ipdb; ipdb.set_trace()                                                โ”‚
โ”‚   140 โ”‚   # exit(0)                                                                              โ”‚
โ”‚   141 โ”‚                                                                                          โ”‚
โ”‚ โฑ 142 โ”‚   trainer = trlx.train(                                                                  โ”‚
โ”‚   143 โ”‚   โ”‚   reward_fn=reward_fn,                                                               โ”‚
โ”‚   144 โ”‚   โ”‚   prompts=train_prompts,                                                             โ”‚
โ”‚   145 โ”‚   โ”‚   eval_prompts=val_prompts[0:1000],  # sampling 1000 validation prompts for evalua   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /data/wenjiaxin/home/trlx/trlx/trlx.py:119 in train                                              โ”‚
โ”‚                                                                                                  โ”‚                       
โ”‚   116 โ”‚   eval_pipeline = get_pipeline(config.train.pipeline)(eval_prompts, max_prompt_length,   โ”‚                       
โ”‚   117 โ”‚   trainer.add_eval_pipeline(eval_pipeline)                                               โ”‚                       
โ”‚   118 โ”‚                                                                                          โ”‚                       
โ”‚ โฑ 119 โ”‚   trainer.learn()                                                                        โ”‚
โ”‚   120 โ”‚   return trainer                                                                         โ”‚
โ”‚   121                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /data/wenjiaxin/home/trlx/trlx/trainer/accelerate_base_trainer.py:550 in learn                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   547 โ”‚   โ”‚   โ”‚   โ”‚                                                                              โ”‚
โ”‚   548 โ”‚   โ”‚   โ”‚   โ”‚   self.post_backward_callback()                                              โ”‚
โ”‚   549 โ”‚   โ”‚   โ”‚                                                                                  โ”‚
โ”‚ โฑ 550 โ”‚   โ”‚   โ”‚   self.post_epoch_callback() # ้‡ๆ–ฐrollout                                       โ”‚
โ”‚   551 โ”‚   โ”‚   tbar.close()                                                                       โ”‚
โ”‚   552 โ”‚                                                                                          โ”‚
โ”‚   553 โ”‚   @abstractmethod                                                                        โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /data/wenjiaxin/home/trlx/trlx/trainer/accelerate_ppo_trainer.py:216 in post_epoch_callback      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   213 โ”‚   โ”‚   โ”‚   self.store.export_history(location=self.rollout_logging_dir)                   โ”‚
โ”‚   214 โ”‚   โ”‚   self.store.clear_history()                                                         โ”‚
โ”‚   215 โ”‚   โ”‚   # Collect more rollouts for training                                               โ”‚
โ”‚ โฑ 216 โ”‚   โ”‚   self.make_experience(self.config.method.num_rollouts, self.iter_count)             โ”‚
โ”‚   217 โ”‚                                                                                          โ”‚
โ”‚   218 โ”‚   def post_backward_callback(self):                                                      โ”‚
โ”‚   219 โ”‚   โ”‚   self.kl_ctl.update(self.approx_kl, n_steps=self.config.train.batch_size)           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /data/wenjiaxin/home/trlx/trlx/trainer/accelerate_ppo_trainer.py:280 in make_experience          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   277 โ”‚   โ”‚   โ”‚   โ”‚   batch: PromptBatch = next(self.prompt_iterator)                            โ”‚
โ”‚   278 โ”‚   โ”‚   โ”‚   except StopIteration:                                                          โ”‚
โ”‚   279 โ”‚   โ”‚   โ”‚   โ”‚   self.prompt_iterator = iter(self.prompt_dataloader)                        โ”‚
โ”‚ โฑ 280 โ”‚   โ”‚   โ”‚   โ”‚   batch = next(self.prompt_iterator)                                         โ”‚
โ”‚   281 โ”‚   โ”‚   โ”‚                                                                                  โ”‚
โ”‚   282 โ”‚   โ”‚   โ”‚   exp_generate_time = time()                                                     โ”‚
โ”‚   283                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /data/wenjiaxin/anaconda3/envs/rl/lib/python3.8/site-packages/accelerate/data_loader.py:369 in   โ”‚
โ”‚ __iter__                                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   366 โ”‚                                                                                          โ”‚
โ”‚   367 โ”‚   def __iter__(self):                                                                    โ”‚
โ”‚   368 โ”‚   โ”‚   if self.rng_types is not None:                                                     โ”‚
โ”‚ โฑ 369 โ”‚   โ”‚   โ”‚   synchronize_rng_states(self.rng_types, self.synchronized_generator)            โ”‚
โ”‚   370 โ”‚   โ”‚   self.gradient_state._set_end_of_dataloader(False)                                  โ”‚
โ”‚   371 โ”‚   โ”‚   # We can safely pass because the default is -1                                     โ”‚
โ”‚   372 โ”‚   โ”‚   with suppress(Exception):                                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /data/wenjiaxin/anaconda3/envs/rl/lib/python3.8/site-packages/accelerate/utils/random.py:89 in   โ”‚
โ”‚ synchronize_rng_states                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   86                                                                                             โ”‚
โ”‚   87 def synchronize_rng_states(rng_types: List[Union[str, RNGType]], generator: Optional[tor    โ”‚
โ”‚   88 โ”‚   for rng_type in rng_types:                                                              โ”‚
โ”‚ โฑ 89 โ”‚   โ”‚   synchronize_rng_state(RNGType(rng_type), generator=generator)                       โ”‚
โ”‚   90                                                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /data/wenjiaxin/anaconda3/envs/rl/lib/python3.8/site-packages/accelerate/utils/random.py:84 in   โ”‚
โ”‚ synchronize_rng_state                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   81 โ”‚   elif rng_type == RNGType.XLA:                                                           โ”‚
โ”‚   82 โ”‚   โ”‚   xm.set_rng_state(rng_state.item())                                                  โ”‚
โ”‚   83 โ”‚   elif rng_type == RNGType.GENERATOR:                                                     โ”‚
โ”‚ โฑ 84 โ”‚   โ”‚   generator.set_state(rng_state)                                                      โ”‚
โ”‚   85                                                                                             โ”‚
โ”‚   86                                                                                             โ”‚
โ”‚   87 def synchronize_rng_states(rng_types: List[Union[str, RNGType]], generator: Optional[tor    โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
RuntimeError: Invalid mt19937 state
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=696, OpType=BROADCAST, Timeout(ms)=1800000) ran for 1802763 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=696, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1802485 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:821] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=697, OpType=ALLREDUCE,
Timeout(ms)=1800000) ran for 1803843 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:456] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:461] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=697, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1803843 milliseconds before timing out.
[23:11:39] WARNING  Sending process 17109 closing signal SIGTERM                                                 api.py:699
[23:12:09] WARNING  Unable to shutdown process 17109 via 15, forcefully exitting via 9                           api.py:716
[23:12:10] ERROR    failed (exitcode: -6) local_rank: 1 (pid: 17110) of binary:                                  api.py:673
                    /data/wenjiaxin/anaconda3/envs/rl/bin/python                 

I haven't found the root cause of this issue, but here is one modification that I am aware of:

  1. I update the torch version from 1.10.1+cu113 to 1.13.1. And my CUDA version is 11.2

Which trlX version are you using?

main (latest)

Additional system and package information

torch 1.13.1

Jiaxin-Wen commented 1 year ago

I find [rollout 134 / 128]: : 134it [08:45, 3.92s/it] in the logging output which is kind of strange, is this reasonable?

maxreciprocate commented 1 year ago

Which accelerate version and config have you used here? I want to reproduce this

Jiaxin-Wen commented 1 year ago

accelerate version: 0.16.0

accelerate config:

command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
  deepspeed_config_file: configs/ds_config_trlx_gptj_summarize.json
  zero3_init_flag: false
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

deepspeed config:

{
  "train_micro_batch_size_per_gpu": 2,
  "gradient_accumulation_steps": 4,
  "fp16": {
    "enabled": true,
    "min_loss_scale": 0.5,
    "fp16_scale_tolerance": 0.25,
    "opt_level": "O2"
  },
  "zero_optimization": {
    "stage": 2,
    "offload_param": {
      "device": "cpu"
    },
    "offload_optimizer": {
      "device": "cpu"
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "contiguous_gradients": true
  }
}

ppo config

train:
  seq_length: 550
  epochs: 50
  total_steps: 100000
  batch_size: 8

  checkpoint_interval: 10000
  eval_interval: 200

  pipeline: "PromptPipeline"
  trainer: "AcceleratePPOTrainer"

model:
  model_path: "sft/gptj-supervised-summarize-checkpoint"
  num_layers_unfrozen: 8

tokenizer:
  tokenizer_path: "gpt2"
  truncation_side: "right"

optimizer:
  name: "adamw"
  kwargs:
    lr: 5.0e-6
    betas: [0.9, 0.999]
    eps: 1.0e-8
    weight_decay: 0.01

scheduler:
  name: "cosine_annealing"
  kwargs:
    T_max: 100000
    eta_min: 5.0e-6

method:
  name: "ppoconfig"
  num_rollouts: 128
  chunk_size: 16
  ppo_epochs: 4
  init_kl_coef: 0.1
  target: 6
  horizon: 10000
  gamma: 1
  lam: 0.95
  cliprange: 0.2
  cliprange_value: 0.2
  vf_coef: 0.2
  scale_reward: False
  ref_mean: null
  ref_std: null
  cliprange_reward: 10
  gen_kwargs:
    max_new_tokens: 50
Jiaxin-Wen commented 1 year ago

Oops, I think I find the reason. I update accerlerate_base_trainer.py to the latest version (according to #315 )