chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 282 forks source link

Multi-channel audio doesn't work with --data_num_channels 2 in Jupyter Lab #105

Open Andy671 opened 3 years ago

Andy671 commented 3 years ago

Hello, @chrisdonahue thanks for the great paper and code sharing of WaveGAN! You slay! I've managed to run it on Google Colab, without any problems. BUT... The problem is I can't make it work on paid Google AI Platform Notebooks in Jupyter Lab. I spent a few days and found out that the problem is --data_num_channels 2. I've tried different setups including CUDA 10, CUDA 11, tensorflow-gpu==1.15.2, tensorflow-gpu==1.14.0, and a few more, but in any case, 2 channel audio just doesn't work and gives me this log (The last line seems to be very promising, as it's how I figured the problem was in 2 channel audio):

Traceback (most recent call last):
  File "train_wavegan.py", line 654, in <module>
    train(fps, args)
  File "train_wavegan.py", line 93, in train
    D_G_z = WaveGANDiscriminator(G_z, **args.wavegan_d_kwargs)
  File "/home/jupyter/DavidBlaine-Project/wavegan/wavegan.py", line 194, in WaveGANDiscriminator
    output = tf.layers.conv1d(output, dim, kernel_len, 4, padding='SAME')
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/layers/convolutional.py", line 218, in conv1d
    return layer.apply(inputs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in __call__
    self._maybe_build(inputs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build
    self.build(input_shapes)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py", line 165, in build
    dtype=self.dtype)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 450, in add_weight
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 384, in add_weight
    aggregation=aggregation)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 663, in _add_variable_with_custom_getter
    **kwargs_for_getter)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable
    aggregation=aggregation)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable
    aggregation=aggregation)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable
    aggregation=aggregation)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter
    aggregation=aggregation)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 869, in _get_single_variable
    (name, shape, found_var.get_shape()))
ValueError: Trying to share variable D/downconv_0/conv1d/kernel, but specified shape (25, 1, 64) and found shape (25, 2, 64).

--data_num_channels 1 works okay, though... Here is the full command:

!export CUDA_VISIBLE_DEVICES="0"
!python train_wavegan.py train "../models/model_test" \
--data_dir "../datasets/dataset_test" \
--data_num_channels 2 \
--data_sample_rate 44100 \
--data_first_slice \
--data_slice_len 32768 \
--data_pad_end \
--data_fast_wav \
--wavegan_genr_pp

I've also tried tensorflow==1.12.0 but it is so outdated that requires CUDA 9...

Operating system: Debian 10 Current tensorflow-gpu: 1.14.0 Requirements.txt of my current pip list: https://drive.google.com/file/d/1irXiAZyYHeUkNH-PYDCHjwbeaDfenCTv/view?usp=sharing

Please, help me out! How can I fix this small occasion?

I will be extremely thankful for any hint!

chrisdonahue commented 3 years ago

Hi Andy. Appreciate the kind words and sorry for the delay.

So are you saying that the exact same configuration (2ch) works on Google Colab, but not on another environment? That is indeed strange.

It looks like this is happening for the discriminator for the generated audio, while it appears that the placeholder for the real audio is indeed stereo. Can you check the shape of the G_z tensor? Is it mono? If so, maybe there's some issue with this line of code due to changes to the tensorflow API since I wrote it: https://github.com/chrisdonahue/wavegan/blob/master/wavegan.py#L132