Closed m986883511 closed 3 years ago
Hi @m986883511, I cannot reproduce your bug on a similar setup (Debian 10, nvidia driver version: 460.84, CUDA Version: 11.2) for which the provided commands run smoothly. I can reproduce similar (but not exactly the same error) when tensorflow is running out of GPU RAM (above 8GB of available GPU RAM, I don't have any issue). So maybe you have not enough RAM available on the GPU you're using, either because the whole amount of RAM of your GPU is too low or because some other jobs are concurrently using some of it, making the available memory too low.
I fix it,you can use my container, find in my web,https://hub.docker.com/u/m986883511
Hi @m986883511, I cannot reproduce your bug on a similar setup (Debian 10, nvidia driver version: 460.84, CUDA Version: 11.2) for which the provided commands run smoothly. I can reproduce similar (but not exactly the same error) when tensorflow is running out of GPU RAM (above 8GB of available GPU RAM, I don't have any issue). So maybe you have not enough RAM available on the GPU you're using, either because the whole amount of RAM of your GPU is too low or because some other jobs are concurrently using some of it, making the available memory too low.
I use tensorflow/tensorflow:2.3.0-gpu base image sove this problem, thank you very mush.
this command work well docker run --rm -v $(pwd):/output deezer/spleeter-gpu:3.8-2stems separate -o /output /output/3t.mp3
but these command failed docker run --rm -v $(pwd):/output --gpus all deezer/spleeter-gpu:3.8-2stems separate -o /output /output/3t.mp3 docker run --rm -v $(pwd):/output --runtime=nvidia deezer/spleeter-gpu:3.8-2stems separate -o /output /output/3t.mp3
error is : Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1349, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/usr/local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1441, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[51,16,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node conv2d_transpose_4/conv2d_transpose}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[51,16,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node conv2d_transpose_4/conv2d_transpose}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/bin/spleeter", line 8, in
sys.exit(entrypoint())
File "/usr/local/lib/python3.8/site-packages/spleeter/main.py", line 256, in entrypoint
spleeter()
File "/usr/local/lib/python3.8/site-packages/typer/main.py", line 214, in call
return get_command(self)(*args, kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(args, kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(args, kwargs)
File "/usr/local/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
return callback(use_params) # type: ignore
File "/usr/local/lib/python3.8/site-packages/spleeter/main.py", line 128, in separate
separator.separate_to_file(
File "/usr/local/lib/python3.8/site-packages/spleeter/separator.py", line 382, in separate_to_file
sources = self.separate(waveform, audio_descriptor)
File "/usr/local/lib/python3.8/site-packages/spleeter/separator.py", line 323, in separate
return self._separate_tensorflow(waveform, audio_descriptor)
File "/usr/local/lib/python3.8/site-packages/spleeter/separator.py", line 305, in _separate_tensorflow
prediction = next(prediction_generator)
File "/usr/local/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 631, in predict
preds_evaluated = mon_sess.run(predictions)
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 774, in run
return self._sess.run(
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1279, in run
return self._sess.run(
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1384, in run
raise six.reraise(original_exc_info)
File "/usr/local/lib/python3.8/site-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1369, in run
return self._sess.run(args, kwargs)
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1437, in run
outputs = _WrappedSession.run(
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 957, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1180, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1358, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[51,16,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node conv2d_transpose_4/conv2d_transpose (defined at /lib/python3.8/site-packages/spleeter/model/functions/unet.py:164) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[51,16,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node conv2d_transpose_4/conv2d_transpose (defined at /lib/python3.8/site-packages/spleeter/model/functions/unet.py:164) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations. 0 derived errors ignored.
Errors may have originated from an input operation. Input Source operations connected to node conv2d_transpose_4/conv2d_transpose: concatenate_3/concat (defined at /lib/python3.8/site-packages/spleeter/model/functions/unet.py:162)
Input Source operations connected to node conv2d_transpose_4/conv2d_transpose: concatenate_3/concat (defined at /lib/python3.8/site-packages/spleeter/model/functions/unet.py:162)
Original stack trace for 'conv2d_transpose_4/conv2d_transpose': File "/bin/spleeter", line 8, in
sys.exit(entrypoint())
File "/lib/python3.8/site-packages/spleeter/main.py", line 256, in entrypoint
spleeter()
File "/lib/python3.8/site-packages/typer/main.py", line 214, in call
return get_command(self)(*args, kwargs)
File "/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(args, kwargs)
File "/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(args, kwargs)
File "/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
return callback(use_params) # type: ignore
File "/lib/python3.8/site-packages/spleeter/main.py", line 128, in separate
separator.separate_to_file(
File "/lib/python3.8/site-packages/spleeter/separator.py", line 382, in separate_to_file
sources = self.separate(waveform, audio_descriptor)
File "/lib/python3.8/site-packages/spleeter/separator.py", line 323, in separate
return self._separate_tensorflow(waveform, audio_descriptor)
File "/lib/python3.8/site-packages/spleeter/separator.py", line 305, in _separate_tensorflow
prediction = next(prediction_generator)
File "/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 612, in predict
estimator_spec = self._call_model_fn(features, None, ModeKeys.PREDICT,
File "/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1163, in _call_model_fn
model_fn_results = self._model_fn(features=features, kwargs)
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 568, in model_fn
return builder.build_predict_model()
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 516, in build_predict_model
tf.estimator.ModeKeys.PREDICT, predictions=self.outputs
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 318, in outputs
self._build_outputs()
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 499, in _build_outputs
self._outputs = self._build_output_waveform(self.masked_stfts)
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 342, in masked_stfts
self._build_masked_stfts()
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 465, in _build_masked_stfts
for instrument, mask in self.masks.items():
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 336, in masks
self._build_masks()
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 432, in _build_masks
output_dict = self.model_outputs
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 312, in model_outputs
self._build_model_outputs()
File "/lib/python3.8/site-packages/spleeter/model/init.py", line 211, in _build_model_outputs
self._model_outputs = apply_model(
File "/lib/python3.8/site-packages/spleeter/model/functions/unet.py", line 197, in unet
return apply(apply_unet, input_tensor, instruments, params)
File "/lib/python3.8/site-packages/spleeter/model/functions/init.py", line 44, in apply
output_dict[out_name] = function(
File "/lib/python3.8/site-packages/spleeter/model/functions/unet.py", line 164, in apply_unet
up5 = conv2d_transpose_factory(conv_n_filters[0], (5, 5))((merge4))
File "/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 776, in call
outputs = call_fn(cast_inputs, *args, kwargs)
File "/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 1291, in call
outputs = backend.conv2d_transpose(
File "/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, *kwargs)
File "/lib/python3.8/site-packages/tensorflow/python/keras/backend.py", line 5177, in conv2d_transpose
x = nn.conv2d_transpose(x, kernel, output_shape, strides,
File "/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(args, kwargs)
File "/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 2482, in conv2d_transpose
return conv2d_transpose_v2(
File "/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 2560, in conv2d_transpose_v2
return gen_nn_ops.conv2d_backprop_input(
File "/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1293, in conv2d_backpropinput
, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 742, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3477, in _create_op_internal
ret = Operation(
File "/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1949, in init
self._traceback = tf_stack.extract_stack()
my system: system:Ubuntu 18.04.5 LTS cuda:NVIDIA-SMI 460.80 Driver Version: 460.80 CUDA Version: 11.2