Open mgerstgrasser opened 4 months ago
Yes this is necessary to reduce the pad of the training samples
Prompt samples with left padding allow us to remove PAD on both sides and then dynamically padding depending on batch training samples
Ah, you mean remove_padding_in_sequences()
? Wouldn't that still work with only right-padding?
Ah, you mean
remove_padding_in_sequences()
? Wouldn't that still work with only right-padding?
This will lead to a lot of pads in the middle
Ah, no, to be clear, what I mean is the following: Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):
| [PAD] [PAD] promp promp promp | respo respo [EOS] [PAD] |
| promp promp promp promp promp | respo respo [EOS] [PAD] |
| [PAD] [PAD] [PAD] promp promp | respo respo respo [EOS] |
What I have in mind is instead to do this:
| promp promp promp | respo respo [EOS] [PAD] [PAD] |
| promp promp promp promp promp | respo respo [EOS] |
| promp promp | respo respo respo [EOS] [PAD] [PAD] |
So, less padding overall, and no padding in the middle. The only thing that is now a little different is that the index where the prompt stops and the response starts isn't the same for each sequence - will that break anything?
Ah, no, to be clear, what I mean is the following: Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):
| [PAD] [PAD] promp promp promp | respo respo [EOS] [PAD] | | promp promp promp promp promp | respo respo [EOS] [PAD] | | [PAD] [PAD] [PAD] promp promp | respo respo respo [EOS] |
What I have in mind is instead to do this:
| promp promp promp | respo respo [EOS] [PAD] [PAD] | | promp promp promp promp promp | respo respo [EOS] | | promp promp | respo respo respo [EOS] [PAD] [PAD] |
So, less padding overall, and no padding in the middle. The only thing that is now a little different is that the index where the prompt stops and the response starts isn't the same for each sequence - will that break anything?
they are
| promp promp promp [PAD] [PAD] | respo respo [EOS] [PAD]
| promp promp promp promp promp | respo respo [EOS] | [PAD] |
| promp promp [PAD] [PAD] [PAD] | respo respo respo [EOS]
That's not what I am proposing though! What I mean is, if I return it without the pads in the middle from _generate_vllm()
, would that break anything? (No worries if I'm not making sense though, I can just try it and see what happens.)
I noticed that RemoteExperienceMaker left-pads the input sequences even when using vllm for generation: https://github.com/OpenLLMAI/OpenRLHF/blob/dcd379a44eea56625626d1a0832cd3eeda048b21/openrlhf/trainer/ppo_utils/experience_maker.py#L346
I can see that a few lines down,
self.actor.process_sequences()
assumes this left-padding, as it calculates an action mask in a way that hinges on all the inputs terminating at the same index.Other than that, are there any other parts of the code that assume that inputs are left-padded, respectively that inputs always terminate on the same index?
If not, I'd like to open a PR that skips the left-padding and calculates action mask directly - the left-padding can be inefficient, and action-masking this way doesn't generalise to multi-turn conversations.