Memory issue - Githubissues

ainazHjm / LandslidePrediction

Classification task for predicting landslides based on GIS maps using locally aligned convolutional neural networks. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

https://creativecommons.org/licenses/by-nc-sa/3.0/

37 stars 11 forks source link

Memory issue #13

Open ainazHjm opened 5 years ago

ainazHjm commented 5 years ago

I have a lot of big rasters/images as my training data and each one is a feature. Before, I was writing the patches of size (200,200) from these rasters into another folder as my input image so the input image was (94, 200, 200). The issue with this was to get more data I needed smaller stride than 200 but this would make my data very large ~700GB which is not efficient.

ainazHjm commented 5 years ago

Instead of writing the images, I'm currently using hdf5 data format which gives me the ability to partially load the data and read it as numpy. I'm also using workers with pytorch data loader to make things faster. However, it's pretty slow ....

ainazHjm commented 5 years ago

Alright, to solve the loading part I created the iterator of the data loader outside my iteration loop and moved my data into the /tmp folder! This improved the loading speed very much but it's still not great. Each epoch takes 1.5 hr with a batch size of 9 and 4 workers but I guess it's not really bad cause I have a (94,200,200) data to load.