DeepLink-org / deeplink.framework

BSD 3-Clause "New" or "Revised" License
55 stars 28 forks source link

Zgc/dipu fix unnecessary large memory tensor allocate and operation #871

Closed zhaoguochun1995 closed 2 months ago

zhaoguochun1995 commented 2 months ago
  1. 修复了torch.add(1, vary_large_tensor, alpha = 0.5)和torch.div(2, vary_large_tensor) 时,申请一个和vary_large_tensor一样大小的临时tensor,再将值fill 进去的问题,减少了设备内存的浪费和不必要的访存开销

Fixed the problem that when torch.add(1, vary_large_tensor) and torch.div(2, vary_large_tensor), a temporary tensor of the same size as vary_large_tensor is allocated and then the value is filled in, which reduces the waste of device memory and unnecessary memory access overhead.