Closed 0i0 closed 6 years ago
i tried converting to 16000 and cutting it to 0.25 sec pieces but i keep getting
2018-03-29 15:15:26.407883: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ******************************************************xx*******************************************_ 2018-03-29 15:15:26.407910: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[32,401,4096] Traceback (most recent call last): File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,401,4096] [[Node: net/net2/cbhg2/conv1d_banks/concat = ConcatV2[N=16, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](net/net2/cbhg2/conv1d_banks/num_1/Relu, net/net2/cbhg2/conv1d_banks/num_2/Relu, net/net2/cbhg2/conv1d_banks/num_3/Relu, net/net2/cbhg2/conv1d_banks/num_4/Relu, net/net2/cbhg2/conv1d_banks/num_5/Relu, net/net2/cbhg2/conv1d_banks/num_6/Relu, net/net2/cbhg2/conv1d_banks/num_7/Relu, net/net2/cbhg2/conv1d_banks/num_8/Relu, net/net2/cbhg2/conv1d_banks/num_9/Relu, net/net2/cbhg2/conv1d_banks/num_10/Relu, net/net2/cbhg2/conv1d_banks/num_11/Relu, net/net2/cbhg2/conv1d_banks/num_12/Relu, net/net2/cbhg2/conv1d_banks/num_13/Relu, net/net2/cbhg2/conv1d_banks/num_14/Relu, net/net2/cbhg2/conv1d_banks/num_15/Relu, net/net2/cbhg2/conv1d_banks/num_16/Relu, net/net2/cbhg2/conv1d_banks/concat/axis)]] [[Node: gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape/_837 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4748_gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "train2.py", line 99, in <module> train(logdir1=logdir1, logdir2=logdir2) File "train2.py", line 57, in train sess.run(train_op, feed_dict={model.x_mfcc: mfcc, model.y_spec: spec, model.y_mel: mel}) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,401,4096] [[Node: net/net2/cbhg2/conv1d_banks/concat = ConcatV2[N=16, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](net/net2/cbhg2/conv1d_banks/num_1/Relu, net/net2/cbhg2/conv1d_banks/num_2/Relu, net/net2/cbhg2/conv1d_banks/num_3/Relu, net/net2/cbhg2/conv1d_banks/num_4/Relu, net/net2/cbhg2/conv1d_banks/num_5/Relu, net/net2/cbhg2/conv1d_banks/num_6/Relu, net/net2/cbhg2/conv1d_banks/num_7/Relu, net/net2/cbhg2/conv1d_banks/num_8/Relu, net/net2/cbhg2/conv1d_banks/num_9/Relu, net/net2/cbhg2/conv1d_banks/num_10/Relu, net/net2/cbhg2/conv1d_banks/num_11/Relu, net/net2/cbhg2/conv1d_banks/num_12/Relu, net/net2/cbhg2/conv1d_banks/num_13/Relu, net/net2/cbhg2/conv1d_banks/num_14/Relu, net/net2/cbhg2/conv1d_banks/num_15/Relu, net/net2/cbhg2/conv1d_banks/num_16/Relu, net/net2/cbhg2/conv1d_banks/concat/axis)]] [[Node: gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape/_837 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4748_gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Caused by op 'net/net2/cbhg2/conv1d_banks/concat', defined at: File "train2.py", line 99, in <module> train(logdir1=logdir1, logdir2=logdir2) File "train2.py", line 18, in train model = Model(mode="train2", batch_size=hp.Train2.batch_size, queue=queue) File "/home/lior/src/aws/deep-voice-conversion/models.py", line 26, in __init__ self.ppgs, self.pred_ppg, self.logits_ppg, self.pred_spec, self.pred_mel = self.net_template() File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/template.py", line 278, in __call__ result = self._call_func(args, kwargs, check_for_new_variables=False) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/template.py", line 217, in _call_func result = self._func(*args, **kwargs) File "/home/lior/src/aws/deep-voice-conversion/models.py", line 115, in _net2 pred_spec = cbhg(pred_spec, hp.Train2.num_banks, hp.Train2.hidden_units // 2, hp.Train2.num_highway_blocks, hp.Train2.norm_type, self.is_training, scope="cbhg2") File "/home/lior/src/aws/deep-voice-conversion/modules.py", line 307, in cbhg is_training=is_training) # (N, T, K * E / 2) File "/home/lior/src/aws/deep-voice-conversion/modules.py", line 191, in conv1d_banks outputs = tf.concat(outputs, -1) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1099, in concat return gen_array_ops._concat_v2(values=values, axis=axis, name=name) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 706, in _concat_v2 "ConcatV2", values=values, axis=axis, name=name) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/home/lior/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,401,4096] [[Node: net/net2/cbhg2/conv1d_banks/concat = ConcatV2[N=16, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](net/net2/cbhg2/conv1d_banks/num_1/Relu, net/net2/cbhg2/conv1d_banks/num_2/Relu, net/net2/cbhg2/conv1d_banks/num_3/Relu, net/net2/cbhg2/conv1d_banks/num_4/Relu, net/net2/cbhg2/conv1d_banks/num_5/Relu, net/net2/cbhg2/conv1d_banks/num_6/Relu, net/net2/cbhg2/conv1d_banks/num_7/Relu, net/net2/cbhg2/conv1d_banks/num_8/Relu, net/net2/cbhg2/conv1d_banks/num_9/Relu, net/net2/cbhg2/conv1d_banks/num_10/Relu, net/net2/cbhg2/conv1d_banks/num_11/Relu, net/net2/cbhg2/conv1d_banks/num_12/Relu, net/net2/cbhg2/conv1d_banks/num_13/Relu, net/net2/cbhg2/conv1d_banks/num_14/Relu, net/net2/cbhg2/conv1d_banks/num_15/Relu, net/net2/cbhg2/conv1d_banks/num_16/Relu, net/net2/cbhg2/conv1d_banks/concat/axis)]] [[Node: gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape/_837 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4748_gradients/net/net2/cbhg2/highwaynet_2/mul_1_grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
move to run on aws works like a charm
@0i0 yo did u get train2 to work? what chopping window did u find optimal? im using the same 2/3 second window as in the arctic slt set
i tried converting to 16000 and cutting it to 0.25 sec pieces but i keep getting