TUDB-Labs / mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters
Apache License 2.0
277 stars 53 forks source link

optim: async data fetch and speedup backward data transfer #197

Closed Vinkle-hzt closed 7 months ago

Vinkle-hzt commented 7 months ago

Issue

The following code takes about 500ms to create a GPU tensor, which will block the forward and backward process, and lead to a low GPU utilization.

tokens = torch.tensor(train_input.batch_tokens_, dtype=torch.int64, device=self.device_)

Changes Proposed in This PR