Open infinitylogesh opened 1 year ago
Added num_return_sequences
as an argument, batch_size
acting as num_return_sequences
was confusing. Now the num_return_sequences
will hold the number of generations per input in the batch and batch_size
for the number of inputs in the batch. Hope this change is fine?. Updated the docs and examples with the argument
@loubnabnl @Muennighoff Please review and let me know your comments. Thanks. ( Do not seem to have access to request for review)
Thank you so much for the detailed review and catching this issue. I will look into it further and update !.
My updates on further analysis, Found the below to be influencing the variations in the scores ( apart from the task id repetition issue ) :
device_specific
parameter in set_seed
is set to True, For the cases where num_return_sequences=n_samples
, Changing the batch size might lead to a device placement of a given task in a different GPU during runtime. Thus could introduce variation in the results due to variation of seed. I have currently made the device_specific
flag to False when the num_return_sequences=n_samples
conditiontorch.multinomial
used for sampling the next token returns a different next token for the same input as the batch size changes. If the same input happens to be in a different index in the batch, which is expected when batch size changes.I am afraid only if this variation in transformers repo is handled, Our scores would be stable for varying batch sizes. Please let me know if there is any work around or suggestions.
Update ! An Update about replicating this behaviour of varying generations for different batch sizes using an external repo:
I used the batch generation script from incoder repo (as suggested by Daniel Fried on Slack) and was able to replicate this behaviour ( as shown below in the screenshot , full colab here). For the same set of inputs, the generations are varying based on the batch size.
So, I believe this is a global behaviour and probably is expected to happen based on my analysis in previous comments.
Update ! An Update about replicating this behaviour of varying generations for different batch sizes using an external repo:
I used the batch generation script from incoder repo (as suggested by Daniel Fried on Slack) and was able to replicate this behaviour ( as shown below in the screenshot , full colab here). For the same set of inputs, the generations are varying based on the batch size.
So, I believe this is a global behaviour and probably is expected to happen based on my analysis in previous comments.
That's very odd, does it also happen for non-code models using the in-built transformer generate function with a batch? E.g. generating with https://huggingface.co/gpt2
Any new progress? Everyone needs it. 😁
Fixes #23