Zgc/dipu fix unnecessary large memory tensor allocate and operation

修复了torch.add(1, vary_large_tensor, alpha = 0.5)和torch.div(2, vary_large_tensor) 时，申请一个和vary_large_tensor一样大小的临时tensor,再将值fill 进去的问题，减少了设备内存的浪费和不必要的访存开销

Fixed the problem that when torch.add(1, vary_large_tensor) and torch.div(2, vary_large_tensor), a temporary tensor of the same size as vary_large_tensor is allocated and then the value is filled in, which reduces the waste of device memory and unnecessary memory access overhead.

DeepLink-org / deeplink.framework

Zgc/dipu fix unnecessary large memory tensor allocate and operation #871