brain-research / realistic-ssl-evaluation

Open source release of the evaluation benchmark suite described in "Realistic Evaluation of Deep Semi-Supervised Learning Algorithms"
Apache License 2.0
458 stars 98 forks source link

VAT GPU vRAM usage #30

Closed hanscol closed 4 years ago

hanscol commented 5 years ago

I am trying to run the experiment associated with runs/figure-2-cifar10-4000-vat-ol0.yml, but the GPU (2080ti) appears to run out of memory (11g). This doesn't appear to be an issue of batch size as the CIFAR-10 images are fairly small, and I don't think the model would take up that much space either. I've tried with Tensorflow version 1.14 and 1.15 with no luck. Any suggestions?

craffel commented 5 years ago

Are you running out of memory (getting an OOM error) or is TF just allocating all your memory? I think TF still grabs the entire GPU memory whenever it starts up.

hanscol commented 5 years ago

On creating the monitored session about 10.6gb of vRAM is used up. For the other methods, this is not an issue. When using VAT, more memory is used up after the initial 10.6gb until all 11gb is allocated. Then I'll see this error "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR".

craffel commented 5 years ago

I have never seen that, not sure what the issue is.

hanscol commented 4 years ago

Apparently the magic version was Tensorflow 1.13.