Open ainazHjm opened 5 years ago
Instead of writing the images, I'm currently using hdf5 data format which gives me the ability to partially load the data and read it as numpy. I'm also using workers with pytorch data loader to make things faster. However, it's pretty slow ....
Alright, to solve the loading part I created the iterator of the data loader outside my iteration loop and moved my data into the /tmp folder! This improved the loading speed very much but it's still not great. Each epoch takes 1.5 hr with a batch size of 9 and 4 workers but I guess it's not really bad cause I have a (94,200,200) data to load.
I have a lot of big rasters/images as my training data and each one is a feature. Before, I was writing the patches of size (200,200) from these rasters into another folder as my input image so the input image was (94, 200, 200). The issue with this was to get more data I needed smaller stride than 200 but this would make my data very large ~700GB which is not efficient.