auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
636 stars 92 forks source link

How to train your own model and apply it? I have come so far but having problem at solver.py #28

Open FurkanGozukara opened 3 years ago

FurkanGozukara commented 3 years ago

Ok I have downloaded visual studio code to debug and understand

I see that make_spect_f0.py is used to generate raptf0and spmelfolders with values

So this make_spect_f0 reads a folder and decides whether it is male voice or female voice from spk2gen.pkl file

So as a beginning I have deleted all folders raptf0and spmel and wavs

then composed a wavs folder and composed another folder inside wavs as p285 which is a male assigned folder

Then inside p285 I have put my more than 2 hours long wav file myfile.wav

Question 1 : Does it have to be 16k hz and mono? or We can use maximum quality?

After I run make_spect_0.py, it has composed myfile.npy and myfile.npy in raptf0 and spmel folders

Then I did run make_metadata.py and it has composed train.pkl inside spmel

Then when I run main.py I get this below error at solver.py

I want to train a model. I don't want test.

Then I want to use this model to convert style of a speech to the trained model

So I need help thank you

@auspicious3000

image

FurkanGozukara commented 3 years ago

Here the console output of the run main.py

PS C:\SpeechSplit>  c:; cd 'c:\SpeechSplit'; & 'C:/Python37/python.exe' 'c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\launcher' '55577' '--' 'c:\SpeechSplit\main.py'
Namespace(beta1=0.9, beta2=0.999, device_id=0, g_lr=0.0001, log_dir='run/logs', log_step=10, model_save_dir='run/models', model_save_step=1000, num_iters=1000000, resume_iters=None, sample_dir='run/samples', sample_step=1000, use_tensorboard=False)
Hyperparameters:
  freq: 8       
  dim_neck: 8   
  freq_2: 8     
  dim_neck_2: 1 
  freq_3: 8     
  dim_neck_3: 32
  dim_enc: 512  
  dim_enc_2: 128
  dim_enc_3: 256
  dim_freq: 80
  dim_spk_emb: 82
  dim_f0: 257
  dim_dec: 512
  len_raw: 128
  chs_grp: 16
  min_len_seg: 19
  max_len_seg: 32
  min_len_seq: 64
  max_len_seq: 128
  max_len_pad: 192
  root_dir: assets/spmel
  feat_dir: assets/raptf0
  batch_size: 16
  mode: train
  shuffle: True
  num_workers: 0
  samplier: 8
Finished loading train dataset...
Generator_3(
  (encoder_1): Encoder_7(
    (convolutions_1): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(80, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(32, 512, eps=1e-05, affine=True)
      )
      (1): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(32, 512, eps=1e-05, affine=True)
      )
      (2): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(32, 512, eps=1e-05, affine=True)
      )
    )
    (lstm_1): LSTM(512, 8, num_layers=2, batch_first=True, bidirectional=True)
    (convolutions_2): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(257, 256, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(16, 256, eps=1e-05, affine=True)
      )
      (1): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(16, 256, eps=1e-05, affine=True)
      )
      (2): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(16, 256, eps=1e-05, affine=True)
      )
    )
    (lstm_2): LSTM(256, 32, batch_first=True, bidirectional=True)
    (interp): InterpLnr()
  )
  (encoder_2): Encoder_t(
    (convolutions): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(80, 128, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(8, 128, eps=1e-05, affine=True)
      )
    )
    (lstm): LSTM(128, 1, batch_first=True, bidirectional=True)
  )
  (decoder): Decoder_3(
    (lstm): LSTM(164, 512, num_layers=3, batch_first=True, bidirectional=True)
    (linear_projection): LinearNorm(
      (linear_layer): Linear(in_features=1024, out_features=80, bias=True)
    )
  )
)
G
The number of parameters: 19437800
Current learning rates, g_lr: 0.0001.
Start training...
We've got an error while stopping in unhandled exception: <class 'StopIteration'>.
Traceback (most recent call last):
  File "c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 1994, in do_stop_on_unhandled_exception
    self.do_wait_suspend(thread, frame, 'exception', arg, EXCEPTION_TYPE_UNHANDLED)
  File "c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 1855, in do_wait_suspend
    keep_suspended = self._do_wait_suspend(thread, frame, event, arg, suspend_type, from_this_thread, frames_tracker)
  File "c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 1890, in _do_wait_suspend
    time.sleep(0.01)
FurkanGozukara commented 3 years ago

data_loader object

image

FurkanGozukara commented 3 years ago

Error from powershell

Man I will break my teeth if only 1 time such open source project works when instructions are followed

image

vlad-i commented 3 years ago

Man I will break my teeth if only 1 time such open source project works when instructions are followed

There are some that work, some even provide environment files and require very little effort. It's free cutting edge technology, so I can't complain :sweat_smile:

Would be really cool to be able to make this one work.

tejuafonja commented 3 years ago

Hi,

I don't know if this will help but I thought to mention that you should also check the make_metadat.py file as well (if you haven't already) because as it is, it's hardcoded - maybe that'll help debug the error.

I'm able to train with the test folder provided and I haven't tried training with custom data yet. I'll be sure to come back and update you if I run into the same error when I do.

Screenshot 2021-01-23 at 23 13 15

FurkanGozukara commented 3 years ago

@tejuafonja yes I have seen it. It defines sound file is male or female. I have given same name for male one. I am still getting error though.

I have uploaded my test here so you can check : https://github.com/FurkanGozukara/SpeechSplitTest

I will delete the repository once I can run it

Thank you very much

yenebeb commented 3 years ago

Hi @FurkanGozukara,

Propably a bit late but maybe for anyone out there stumbling on this problem here's a fix.

The main problem you have is 2hours long wav file. In make_spect_f0 it reads the file and computes the spectogram and the f0. This will however only generate one training file.

The error you're getting (stopIteration) is exactly because of that. When you try to run the code it will fetch your data (only one in your case): line 113 in solver.py: data_iter = iter(data_loader) And set the iterator on the first value

Further down line 141-145 you see this:

try:
                x_real_org, emb_org, f0_org, len_org = next(data_iter)
            except:
                data_iter = iter(data_loader)
                x_real_org, emb_org, f0_org, len_org = next(data_iter)

Here we try to get the next iterator, which isn't possible (since there's only 1 file for training). We catch the exception, load the data again and do exactly the same.

Normally (with more than 1 training file) this would solve the problem since we start at the beginning of our training data. But with only 1 file for training it will throw an stopIteration again.

So to solve this, just use more than 1 file. You can for example cut your 2hours long wav file in pieces and put them all in the p285 map (it's important that the same voices goes in the same folder).

FurkanGozukara commented 3 years ago

@yenebeb so basically if i duplicate my training file it should work

i will test ty