issues
search
flowersteam
/
lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
MIT License
190
stars
18
forks
source link
Inefficient PPO example
#5
Closed
ClementRomac
closed
1 year ago
ClementRomac
commented
1 year ago
PPO example would be more efficient with:
logprobs from the LLM instead of normalized probs
value loss clipping
gradient clipping
minibatches (and gradient accumulation for low-memory setups)
ClementRomac
commented
1 year ago
6 Fixed it
PPO example would be more efficient with: