deezer / spleeter

Deezer source separation library including pretrained models.
https://research.deezer.com/projects/spleeter.html
MIT License
25.86k stars 2.84k forks source link

[Bug] Graph is finalized after using spleeter in python #524

Open BreezeWhite opened 3 years ago

BreezeWhite commented 3 years ago

Description

I'm using spleeter as a python package in my code, and when using tensorflow as the stft_backend, the default global tensorflow graph will be finalized after spleeter finished the separation, leading to a situation that tensorflow models are unable to be initialized anymore in further process.

Step to reproduce

  1. Installed using pip install spleeter
  2. Run as spleeter python pacakage
  3. Got RuntimeError: Graph is finalized and cannot be modified. error

Version

Steps to reproduce:

from spleeter.separator import Separator
import tensorflow as tf

sep = Separator("spleeter:2stems")
sep.separate_to_file("example.wav", "/tmp/test")

# Below will fail
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(10, 10)),
    tf.keras.layers.Dense(10)
])

Output

2020-11-30 23:16:38.229885: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-30 23:16:39.902822: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-30 23:16:39.906834: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:39.907200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-30 23:16:39.907235: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-30 23:16:39.909192: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-30 23:16:39.910422: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-30 23:16:39.910637: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-30 23:16:39.911894: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-30 23:16:39.912559: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-30 23:16:39.915218: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-30 23:16:39.915319: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:39.915741: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:39.916050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-30 23:16:41.953061: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-30 23:16:41.973544: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3600000000 Hz
2020-11-30 23:16:41.973886: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x68f98d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-30 23:16:41.973915: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-30 23:16:42.031372: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:42.031785: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b76b40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-30 23:16:42.031803: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-11-30 23:16:42.031954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:42.032263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-11-30 23:16:42.032291: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-30 23:16:42.032313: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-30 23:16:42.032327: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-30 23:16:42.032341: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-30 23:16:42.032354: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-30 23:16:42.032366: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-30 23:16:42.032379: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-30 23:16:42.032428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:42.032766: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:42.033074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-30 23:16:42.033095: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-30 23:16:42.333509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-30 23:16:42.333558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-30 23:16:42.333566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-30 23:16:42.333741: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:42.334136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-30 23:16:42.334456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7822 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-11-30 23:16:43.200089: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-30 23:16:43.302117: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-30 23:16:43.893600: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
INFO:spleeter:File /tmp/test_spleeter/Ava/vocals.wav written succesfully
INFO:spleeter:File /tmp/test_spleeter/Ava/accompaniment.wav written succesfully
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    tf.keras.layers.Dense(10)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/sequential.py", line 142, in __init__
    self.add(layer)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/sequential.py", line 202, in add
    batch_shape=batch_shape, dtype=dtype, name=layer.name + '_input')
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/input_layer.py", line 311, in Input
    input_layer = InputLayer(**input_layer_config)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/input_layer.py", line 160, in __init__
    ragged=ragged)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 1223, in placeholder
    x = array_ops.placeholder(dtype, shape=shape, name=name)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 3100, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6809, in placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3460, in _create_op_internal
    self._check_not_finalized()
  File "/data/omnizart/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3050, in _check_not_finalized
    raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.

Environment

OS Ubuntu-18.04
Installation type pip
RAM available 128GB
Hardware spec NVIDIA 1080TI 12GB / Intel Core I7-7700 / CUDA V11.0 / CUDA Driver: 450.80.0

Additional context

Currently my workaround is to set sep._params["stft_backend"] = "librosa" to avoid using the tensorflow backend. I've checked the code that in librosa approach, new local graph will be initialized, and later separation will all within that graph. The default global graph will thus not being finalized, and all would be fine when initializing new tensorflow models in later process.

I've also tried to initialize the tensorflow model after separation like following:

with tf.Graph().as_default():
    model = tf.keras.models.Sequential(...)

though it succeeded, but when I tried to load some pre-trained models inside the with, it was extremely slow, compared to load the model that is not in the graph mode. It usually takes only a few seconds to load the pre-trained models, but it takes up to 8~10 minutes to load the same model in graph mode.

Spleeter is a great tool and pretty easy to getting start that I've rarely found from other open sources. Hope this issue could help others when they enconter the same problem. It took me a few days getting around the bug, and hope there will be a good solution in the future.

Thanks.

romi1502 commented 3 years ago

Hi @BreezeWhite Thank you for reporting this issue. Indeed the mechanism for separation is a bit different between the librosa and tensorflow backend. In the tensorflow backend, an infinite tensorflow dataset is created to feed a tensorflow estimator that performs the predictions. The dataset is fed with new data each time the separate (or separate_to_file) method is called. Thus there is an underlying graph for this dataset that is never closed and that may be responsible for the issue you encountered.

Note that on GPU, the tensorflow backend is much faster for performing separation, so it would be a pity to use the librosa backend instead.

I can't see a proper workaround at the moment besides what you suggested which seems to result in speed issues. Rewriting the way spleeter uses tensorflow for performing separation may actually be necessary but I can't see a better way so far. So any help on this topic would be appreciated.

Donavin97 commented 3 years ago

Hi Taufeeque,

Thanks for the edit, this looks much better.

May we use this as my final abstract?

I don't just want to use your edits without consenting with you first.

Regards: Donavin.

On 2020/12/01 18:22, Romain Hennequin wrote:

Hi @BreezeWhite https://github.com/BreezeWhite Thank you for reporting this issue. Indeed the mechanism for separation is a bit different between the |librosa| and |tensorflow| backend. In the |tensorflow| backend, an infinite tensorflow |dataset| is created to feed a tensorflow |estimator| that performs the predictions. The dataset is fed with new data each time the |separate| (or |separate_to_file|) method is called. Thus there is an underlying graph for this dataset that is never closed and that may be responsible for the issue you encountered.

Note that on GPU, the |tensorflow| backend is much faster for performing separation, so it would be a pity to use the |librosa| backend instead.

I can't see a proper workaround at the moment besides what you suggested which seems to result in speed issues. Rewriting the way spleeter uses tensorflow for performing separation may actually be necessary but I can't see a better way so far. So any help on this topic would be appreciated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deezer/spleeter/issues/524#issuecomment-736659356, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZF77Y7BYW4LEGKGYMTA73SSUJ3RANCNFSM4UHX5ZAQ.