[question] long context for single model ppo training

OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

https://openrlhf.readthedocs.io/

Apache License 2.0

1.72k stars 161 forks source link

Closed yananchen1989 closed 1 month ago

yananchen1989 commented 1 month ago

hello. is it ok to ppo train 7b level models on single gpu where each sample has around 6k length in prompt and round 2k length in completion ?

any advice for this scenario ? thanks

hijkzzz commented 1 month ago

A single GPU is not enough to run 7B + long text