Closed Robotwithsoul closed 5 years ago
How odd - do you have any idea why this is the case? Thanks a lot for the script for quick testing - can confirm that I am indeed seeing a speedup (not so significant for smaller tensors, but sure). Feel free to submit a PR for this change!
I'm not sure, but I guess this is because that the following codes perform the data type conversion in CPU
.to(dtype=torch.float32, device=device)
while the suggested codes perform the data type conversion in GPU
.to(device=device).to(dtype=torch.float32)
Hi, In memory.py, I suggested to change a little bit of your codes, which will be helpful for improving the training speed (especially when the input image data is 3D)
Here is the code for testing:
T,T1=[],[]
device=torch.device('cuda') for i in range (0,4):
for i in range (0,4): A=np.zeros((100,100,100),dtype=np.int) B=torch.tensor(A,dtype=torch.int) T1.append(B)
This line is used for initilization
M=torch.stack(T).to(dtype=torch.float32, device=device).div_(255)
Comparison
timea=timeit.defaulttimer() M=torch.stack(T).to(dtype=torch.float32, device=device).div(255) timeb=timeit.defaulttimer() N=torch.stack(T1).to(device=device).to(dtype=torch.float32).div(255) timec=timeit.default_timer()
print("time1 is:{}\n time2 is:{}".format(timeb-timea,timec-timeb))