Closed RylanSchaeffer closed 1 year ago
The activations of early convnet layers in particular end up being quite large, e.g. we often have to run models with 200-400GB of memory if we want to investigate these early layers. You could ignore these layers, or increase the available memory.
What you said makes sense, but I'm not sure I see the relevance to the error?
In this particular error, the activations have shape (1216, 256, 32, 32)
and thus require only 1.19 GiB of memory. That is much less than the total available memory, unless something else is already hogging all the memory. So is something hogging all the memory (and if so, what is it and why is it present) or is something else going awry?
I believe it's attempting to allocate 1.19 GiB additional memory in the concatenation of the arrays (np.concatenate((layer_activations[layer_name], layer_output))
). You could profile how big the layer_activations
already are up to that point, I'm guessing this does not occur in the first batch of images but rather when accumulating multiple batches. All the activations are needed to compare against the neural recordings later.
I'm getting an OOM error that allegedly says not enough memory can be found for 1.19 GB. I'm running SLURM jobs with ~80GB.
How can I investigate the cause? Is it possible that previous layers' activations are consuming memory? If so, is there some flag or some mechanism to free that memory?