Robotwithsoul commented 5 years ago

Hi, In memory.py, I suggested to change a little bit of your codes, which will be helpful for improving the training speed (especially when the input image data is 3D)

your original codes

state = torch.stack([trans.state for trans in transition[:self.history]]).to(dtype=torch.float32, device=self.device).div_(255)
next_state = torch.stack([trans.state for trans in transition[self.n:self.n + self.history]]).to(dtype=torch.float32, device=self.device).div_(255)

suggested codes

state = torch.stack([trans.state for trans in transition[:self.history]]).to(device=self.device).to(dtype=torch.float32).div_(255)
next_state = torch.stack([trans.state for trans in transition[self.n:self.n + self.history]]).to(device=self.device).to(dtype=torch.float32).div_(255)

Here is the code for testing：


import timeit
import numpy as np
import torch

T,T1=[],[]

device=torch.device('cuda') for i in range (0,4):

A=np.zeros((100,100,100),dtype=np.int)
B=torch.tensor(A,dtype=torch.int)
T.append(B)

for i in range (0,4): A=np.zeros((100,100,100),dtype=np.int) B=torch.tensor(A,dtype=torch.int) T1.append(B)

This line is used for initilization

M=torch.stack(T).to(dtype=torch.float32, device=device).div_(255)

Comparison

timea=timeit.defaulttimer() M=torch.stack(T).to(dtype=torch.float32, device=device).div(255) timeb=timeit.defaulttimer() N=torch.stack(T1).to(device=device).to(dtype=torch.float32).div(255) timec=timeit.default_timer()

print("time1 is:{}\n time2 is:{}".format(timeb-timea,timec-timeb))

Kaixhin commented 5 years ago

How odd - do you have any idea why this is the case? Thanks a lot for the script for quick testing - can confirm that I am indeed seeing a speedup (not so significant for smaller tensors, but sure). Feel free to submit a PR for this change!

Robotwithsoul commented 5 years ago

I'm not sure, but I guess this is because that the following codes perform the data type conversion in CPU

.to(dtype=torch.float32, device=device)

while the suggested codes perform the data type conversion in GPU

.to(device=device).to(dtype=torch.float32)

Kaixhin / Rainbow

Suggestions for improving training speed (especially when input data is large) #48

This line is used for initilization

Comparison