Closed scfrank closed 9 years ago
Yup! I didn't bump into this error because our server has 256GB RAM.
I think Theano (and by association Keras) uses 32-bit floats by default, which means ~ 12.9GB RAM (30000 x 39 x 2763 x 4 (bytes in a 32-bit float) / (1024*1024)) to assign the vectorised_sentences structure.
I don't know how we can reduce the memory footprint; perhaps a scipy sparse matrix might help? It will become an even bigger problem in the future with larger datasets.
I will look at the code with a view to thinking about iterables/generating on the fly instead of using up all the memory. This will lead pretty soon to (badly) reinventing fuel (https://github.com/mila-udem/fuel) so maybe it would be worth moving to that.
On 5 August 2015 at 16:56, Desmond Elliott notifications@github.com wrote:
Yup! I didn't bump into this error because our server has 256GB RAM.
I think Theano (and by association Keras) uses 32-bit floats by default, which means ~ 12.9GB RAM (30000 x 39 x 2763 x 4 (bytes in a 32-bit float) / (1024*1024)) to assign the vectorised_sentences structure.
I don't know how we can reduce the memory footprint; perhaps a scipy sparse matrix might help? It will become an even bigger problem in the future with larger datasets.
— Reply to this email directly or view it on GitHub https://github.com/elliottd/GroundedTranslation/issues/1#issuecomment-128025214 .
Fixed by 17c9eff
Running
python train.py
with default arguments (flickr8k dataset etc) throws a MemoryError:This is on a CPU machine (4 cores, 8GB RAM, 16GB swap). The values for the np.zeros() parameters are 30000_39_2763, which results in a 3*10^9 item array/tensor, which would seem to result in a ~192 GB table (assuming 64bit zeros). Perhaps I am off by some orders of magnitude.