_reorder_cache fix for generation utils

BlackSamorez / tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference

MIT License

629 stars 39 forks source link

_reorder_cache fix for generation utils #56

Closed BlackSamorez closed 1 year ago

BlackSamorez commented 1 year ago

It's unclear on what device an auxiliary tensor beam_idx are located during beam-search generation and it has caused some issues. This PR explicitly puts beam_idx on correct device for each model shard.