optim: async data fetch and speedup backward data transfer - Githubissues

TUDB-Labs / mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters

Apache License 2.0

277 stars 53 forks source link

optim: async data fetch and speedup backward data transfer #197

Closed Vinkle-hzt closed 7 months ago

Vinkle-hzt commented 7 months ago

Issue

The following code takes about 500ms to create a GPU tensor, which will block the forward and backward process, and lead to a low GPU utilization.

tokens = torch.tensor(train_input.batch_tokens_, dtype=torch.int64, device=self.device_)

Changes Proposed in This PR

Asynchronous training input creation
Remove batch_tokens_ in backward message to speedup data transfer