Closed hanscol closed 4 years ago
Are you running out of memory (getting an OOM error) or is TF just allocating all your memory? I think TF still grabs the entire GPU memory whenever it starts up.
On creating the monitored session about 10.6gb of vRAM is used up. For the other methods, this is not an issue. When using VAT, more memory is used up after the initial 10.6gb until all 11gb is allocated. Then I'll see this error "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR".
I have never seen that, not sure what the issue is.
Apparently the magic version was Tensorflow 1.13.
I am trying to run the experiment associated with runs/figure-2-cifar10-4000-vat-ol0.yml, but the GPU (2080ti) appears to run out of memory (11g). This doesn't appear to be an issue of batch size as the CIFAR-10 images are fairly small, and I don't think the model would take up that much space either. I've tried with Tensorflow version 1.14 and 1.15 with no luck. Any suggestions?