AlreadyExistsError: Resource

google-research / planet

Learning Latent Dynamics for Planning from Pixels

https://danijar.com/planet

Apache License 2.0

1.18k stars 202 forks source link

AlreadyExistsError: Resource #47

Closed danijar closed 5 years ago

danijar commented 5 years ago

This error shows when there is not enough GPU memory available:

tensorflow.python.framework.errors_impl.AlreadyExistsError: Resource __per_step_3/graph/cond_1/optimizer_main/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
     [[{{node graph/cond_1/optimizer_main/gradients/AddN_8/tmp_var}}]]

Setting --params {batch_shape: [20, 50]} reduces the batch size from 50 to 20.

cfzing commented 5 years ago

Even setting the batch size to 20, this error still exists.

danijar commented 5 years ago

Did you try a smaller batch size than 20?

cfzing commented 5 years ago

Yes, I tried to reduce the batch size to 10 or 5. It didn't work.

danijar commented 5 years ago

Does TF find your GPU and does your GPU have enough memory available (no other TF running)?

cfzing commented 5 years ago

Thank you so much for your answer. Setting --params {batch_shape: [1, 50]} ,it stared training. I ran 2 days on single 2080TI and the epoch just reach to 9. There is another TF running on this computer, but I checked the GPU memory, GPU memory only takes up about 3 percent. Is there a way to make this program run faster?

danijar commented 5 years ago

This is not really specific to this code. Make sure the other program has the growing memory option enabled to not reserve all GPU memory. It may also be that TF doesn't use the GPU because another program is already using it. Either way, I recommend running only one training run at the same time. Good luck!

danijar commented 4 years ago

I found that this error can actually often be avoided without reducing the batch size, by instead disabling TensorFlow's memory optimizations in _create_session() in trainer.py:

from tensorflow.core.protobuf import rewriter_config_pb2
off = rewriter_config_pb2.RewriterConfig.OFF
config.graph_options.rewrite_options.memory_optimization  = off