Closed BlackSamorez closed 1 year ago
It's unclear on what device an auxiliary tensor beam_idx are located during beam-search generation and it has caused some issues. This PR explicitly puts beam_idx on correct device for each model shard.
beam_idx
It's unclear on what device an auxiliary tensor
beam_idx
are located during beam-search generation and it has caused some issues. This PR explicitly putsbeam_idx
on correct device for each model shard.