danecor / VaST

An implementation of Variational State Tabulation, from the paper here: https://arxiv.org/abs/1802.04325.
MIT License
13 stars 3 forks source link

running issues #4

Open z4z5 opened 5 years ago

z4z5 commented 5 years ago

Hi @danecor , I have installed all the things and succeed in running the pytest to test all the unit tests.But when I try to use the command : CUDA_VISIBLE_DEVICES=0 python run.py doom tmaze --num_steps=500000 --burnin=10000 --epsilon_period=40000 to run the example,the error appeared after a while.The output is below: Namespace( AG_SEED=None, ENV_SEED=None, NP_SEED=None, TF_SEED=None, act_func=None, beta_prior=None, burnin=10000, concurrent_batches=None, debug=False, delete_old_episodes=False, discount=None, epsilon_period=40000, exp_eps_decay=False, experiment=['tmaze'], freeze_weights=False, gpu_frac=None, grad_norm_clip=None, hist_len=None, init_capacity=None, learning_rate=None, living_reward=None, map_path=None, max_replay_size=None, min_epsilon=None, minibatch_size=None, module=['doom'], n_z=None, net_arch=None, num_reset_steps=None, num_steps=500000, path_ext=None, pri_cutoff=None, prior_tau=None, record=False, restore_path=None, restore_weights_path=None, seed=0, show_screen=False, straight_through=False, summary_step=None, tau_max=None, tau_min=None, tau_period=None, test=False, test_epoch_length=None, test_epsilon=None, track_repeats=False, train_epoch_length=None, train_reward=False, train_step=None, trigger=None, trigger_step=None) 2019-09-11 15:51:48,095 - MainThread - INFO: Starting Experiment 2019-09-11 15:51:50.338234: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2019-09-11 15:51:50.338271: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2019-09-11 15:51:50.338277: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2019-09-11 15:51:50.338282: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2019-09-11 15:51:50.338287: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2019-09-11 15:51:52.447393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: Tesla P40 major: 6 minor: 1 memoryClockRate (GHz) 1.531 pciBusID 0000:04:00.0 Total memory: 22.38GiB Free memory: 22.21GiB 2019-09-11 15:51:52.447439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2019-09-11 15:51:52.447446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2019-09-11 15:51:52.447456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P40, pci bus id: 0000:04:00.0) 2019-09-11 15:51:53,336 - MainThread - INFO: Starting sweeper table 2019-09-11 15:51:53.833710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P40, pci bus id: 0000:04:00.0) 2019-09-11 15:51:57,364 - MainThread - INFO: Step: 01000 Reward: 8 Epsilon: 1.000 2019-09-11 15:52:00,362 - MainThread - INFO: Step: 02000 Reward: 1 Epsilon: 1.000 2019-09-11 15:52:03,139 - MainThread - INFO: Step: 03000 Reward: -8 Epsilon: 1.000 2019-09-11 15:52:06,004 - MainThread - INFO: Step: 04000 Reward: -1 Epsilon: 1.000 2019-09-11 15:52:08,949 - MainThread - INFO: Step: 05000 Reward: 7 Epsilon: 1.000 2019-09-11 15:52:11,796 - MainThread - INFO: Step: 06000 Reward: 5 Epsilon: 1.000 2019-09-11 15:52:14,769 - MainThread - INFO: Step: 07000 Reward: 3 Epsilon: 1.000 2019-09-11 15:52:17,592 - MainThread - INFO: Step: 08000 Reward: 1 Epsilon: 1.000 2019-09-11 15:52:20,517 - MainThread - INFO: Step: 09000 Reward: -4 Epsilon: 1.000 2019-09-11 15:52:23,533 - MainThread - INFO: Step: 10000 Reward: 4 Epsilon: 1.000 2019-09-11 15:52:25.069607: W tensorflow/core/framework/op_kernel.cc:1158] Unimplemented: CopySliceToElement Unhandled data type: 17 Exception in thread TrainingThread: Traceback (most recent call last): File "/home/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/anaconda2/lib/python2.7/threading.py", line 754, in run self.target(*self.args, self.__kwargs) File "/home/z4z5/VaST/models/vae.py", line 354, in trainthread zs, = super(ConcurrentVAE, self).train(step, summary_writer) File "/home/z4z5/VaST/models/vae.py", line 211, in train results = self.sess.run(fetches, kwargs) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) UnimplementedError: CopySliceToElement Unhandled data type: 17 [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[-1,60,80,6], [-1], [-1]], output_types=[DT_UINT8, DT_UINT16, DT_BOOL], _device="/job:localhost/replica:0/task:0/cpu:0"]] [[Node: strided_slice_8/_21 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_136_strided_slice_8", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op u'IteratorGetNext', defined at: File "run.py", line 93, in restore_weights_path) File "/home/z4z5/VaST/io_utils.py", line 90, in init_experiment restore_step=rstep, restore_path=restore_weights_path) File "/home/z4z5/VaST/models/base.py", line 27, in init self._create_graph(params, restore_step) File "/home/z4z5/VaST/models/base.py", line 35, in _create_graph self._create_network() File "/home/z4z5/VaST/models/vae.py", line 290, in _create_network super(ConcurrentVAE, self)._create_network() File "/home/z4z5/VaST/models/vae.py", line 37, in _create_network self._create_input() File "/home/z4z5/VaST/models/vae.py", line 312, in _create_input obs, self.acts, self.starts = self.iterator.get_next() File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/data/python/ops/dataset_ops.py", line 247, in get_next name=name)) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 254, in iterator_get_next output_shapes=output_shapes, name=name) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()

UnimplementedError (see above for traceback): CopySliceToElement Unhandled data type: 17 [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[-1,60,80,6], [-1], [-1]], output_types=[DT_UINT8, DT_UINT16, DT_BOOL], _device="/job:localhost/replica:0/task:0/cpu:0"]] [[Node: strided_slice_8/_21 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_136_strided_slice_8", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

I can't find the solution after google it,can you help me? Thanks, z4z5