OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.72k stars 160 forks source link

Custom ExperienceMaker #285

Open mgerstgrasser opened 2 months ago

mgerstgrasser commented 2 months ago

I have a use case where I'd like to use a custom ExperienceMaker class instead of either of the provided ones. As far as I can tell, there isn't currently a way to configure the experience maker class used by PPOTrainer / ActorPPOTrainer.

Would you be open to a PR that makes this configurable?

And if yes, would the best way of doing this be to put it into the DeepspeedStrategy?

hijkzzz commented 2 months ago

I think ExperienceMaker has nothing to do with DeepSpeed I would like to know what kind of ExperienceMaker you need at first

mgerstgrasser commented 2 months ago

I would like to know what kind of ExperienceMaker you need at first

I had multiple different ones in mind, actually, for different projects. For instance:

Basically, anything that changes what the model does / how it interacts with the outside world, or how it is evaluated, would be implemented in the experience maker.

I think ExperienceMaker has nothing to do with DeepSpeed

I agree! I only suggest it because the DeepspeedStrategy is a convenient configuration-like object that's already passed to the Trainer anyway. The alternative would be to e.g. pass the experience maker class as an optional argument the the Trainer constructor, but in the Ray case that would require passing it through several other objects as well, i.e. more changes.

I could of course just monkey-patch the experience maker class, or maintain my own fork, so all of this isn't absolutely essential. But I think it might be useful for others to have this configurable as well.

hijkzzz commented 2 months ago

The alternative would be to e.g. pass the experience maker class as an optional argument the the Trainer constructor, but in the Ray case that would require passing it through several other objects as well, i.e. more changes.

I think it's probably better this way

Welcome to have related MR contributions~

mgerstgrasser commented 1 month ago

OK! I'll test this on my side for a bit to make sure it covers all the use cases I have in mind. I'll open a PR in a couple of weeks.