Closed Vinkle-hzt closed 7 months ago
The following code takes about 500ms to create a GPU tensor, which will block the forward and backward process, and lead to a low GPU utilization.
tokens = torch.tensor(train_input.batch_tokens_, dtype=torch.int64, device=self.device_)
batch_tokens_
Issue
The following code takes about 500ms to create a GPU tensor, which will block the forward and backward process, and lead to a low GPU utilization.
Changes Proposed in This PR
batch_tokens_
in backward message to speedup data transfer