alexlee-gk / video_prediction

Stochastic Adversarial Video Prediction
https://alexlee-gk.github.io/video_prediction/
MIT License
303 stars 65 forks source link

topological sort failed #9

Closed YeTianJHU closed 6 years ago

YeTianJHU commented 6 years ago

When I was trying to generate gifs from a pre-trained model on the bair dataset, this error happens:

evaluation samples from 0 to 8
2018-11-05 13:18:39.651172: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:675] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-05 13:18:39.658586: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:675] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-05 13:18:41.185743: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Segmentation fault (core dumped)

I'm using tensorflow 1.10.1, CUDA 9.0 and cuDNN 7.1 on Ubuntu 16.04. The CUDA/ cuDNN are installed properly. My GPUs are dual nvidia titan v. I also tried tensorflow 1.6.0 but the same result. Do you have any ideas about this error? Thanks in advance!

Update: I also tried cuDNN 7.0.5 but still have this problem. Thanks!

Glooow1024 commented 6 years ago

When I was trying to train a savp model, I met the same problem. Have you solved this issue?

YeTianJHU commented 6 years ago

No. I think it may relate to some cuDNN version issues. Still working on it.

alexlee-gk commented 6 years ago

The first 2 errors about topological order doesn't seem to be the issue and you can probably ignore it.

As you said, the problem is likely cudnn. According to here, you should use cuDNN SDK >= 7.2. Can you upgrade cudnn?

YeTianJHU commented 6 years ago

Problem solved. I need to reinstall tensorflow after upgrading the cudnn. Many thanks!!

Glooow1024 commented 6 years ago

Hi Alex,

I use tensorflow 1.11.0, CUDA 9.0.176 and cuDNN 7.3.1 on Ubuntu 16.04. My GPUs are nvidia titan Xp. When I was trying to train a savp model with command

CUDA_VISIBLE_DEVICES=0,1 python scripts/train.py --input_dir data/bair --dataset bair \
  --model savp --model_hparams_dict hparams/bair_action_free/ours_savp/model_hparams.json \
  --output_dir logs/bair_action_free/ours_savp \
  --gpu_mem_frac 0.7

the program seems to stop runing and never move forward like in an endless loop, after outputing

2018-11-08 08:14:24.668709: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:14:25.028003: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

What's the possible reason for this? More information is below

2018-11-08 08:13:09.286148: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:13:09.515356: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
session.run took 22.4s
recording summary
done
recording image summary
done
progress  global step 0  epoch 0  step 2560
discrim_video_sn_gan_loss (1.0238764, 1.0)
discrim_video_sn_vae_gan_loss (0.895689, 1.0)
gen_l1_loss (0.0804427, 100.0)
gen_video_sn_gan_loss (1.0158763, 1.0)
gen_video_sn_vae_gan_loss (0.8958128, 1.0)
gen_kl_loss (0.045274347, 0.0)
learning_rate 0.0002
saving model to logs/bair_action_free/ours_savp
done
2018-11-08 08:14:24.668709: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2018-11-08 08:14:25.028003: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
datianshi21 commented 5 years ago

Same problem here, please help 2019-05-14 15:09:28.745521: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-05-14 15:09:30.421876: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-05-14 15:13:00.040767: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.9.2 locally INFO:tensorflow:loss = 0.7136042, step = 0 2019-05-14 15:13:45.954118: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-05-14 15:13:47.336729: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.