wrong action_log_probs returned?

OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

https://openrlhf.readthedocs.io/

Apache License 2.0

1.71k stars 160 forks source link

Closed thirteenflt closed 1 month ago

thirteenflt commented 1 month ago

https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/models/actor.py#L176 log_probs[:, -num_actions:] is returned. But the seq should contain both left padding(added before generation) and right padding(added after generation). The num_actions is the number of tokens in the output which is ok.

But the problem is that it's not necessarily the last num_actions in the log_probs(seq): log_probs[:, -num_actions:]

Am I getting it wrong?

thirteenflt commented 1 month ago

oh, num_actions is the size of action_mask. I do get it wrong.