Time for each epoch increases after changing input type

lxguan1 commented 2 years ago

Hi,

So I have been trying to get pix2pixHD to work on numpy arrays rather than PIL images, and after conversion each epoch now takes progressively more time (e.g. epoch 1 takes 126 seconds, epoch 2 takes 160, ...). Has anyone else run into this problem, and if so was there a way to fix it?

zhangxiaojuan66 commented 2 years ago

你好，邮件已收到，祝你万事如意，生活愉快！

lxguan1 commented 2 years ago

To clarify, we are doing a research project which requires using data which is of higher precision than what PIL images can store in each pixel. Right now, the modifications we have done were to have the code load in numpy arrays, to normalize the data to be between -1 and 1 by dividing each image by the maximum value of all images, using array slicing to handle cropping of images, and using scipy's interp2d to handle resizing of images to powers of 2. The current issue is that each epoch is taking more time than the previous one, and the increase follows a linear relationship: This increase starts with each independent run of pix2pixHD, irrespective of which epoch it starts at. We have found that pix2pixHD is apparently saving additional items each epoch by using memory profiling tools, which may be the reason for the time increase, but have not found any place where additional data would be saved between epochs. Any help would be appreciated.

lxguan1 commented 2 years ago

To add on further, we modified train.py and noticed some peculiar behavior. Namely, the increase in time per epoch happens even if the model is saved and reloaded each epoch with a for loop in the main function. The increase in time and memory is identical to the increase when running the model normally. We were finally able to suppress this behavior by using a loop in the bash script which runs the model one epoch at a time, but this situation is not ideal so any help will be appreciated.

NVIDIA / pix2pixHD

Time for each epoch increases after changing input type #284