athena-team / athena

an open-source implementation of sequence-to-sequence based speech processing engine
https://athena-team.readthedocs.io
Apache License 2.0
953 stars 196 forks source link

Errors when using voice converse example #298

Open xinkez opened 4 years ago

xinkez commented 4 years ago

Hi,

when I ran the script of examples/vc/vcc2018/run.sh, I got such errors below. Thank you in advance.

[ Traceback (most recent call last): File "athena/stargan_main.py", line 179, in train(json_file, GanSolver, 1, 0) File "athena/stargan_main.py", line 125, in train p, model, checkpointer = build_model_from_jsonfile_stargan(jsonfile) File "athena/stargan_main.py", line 105, in build_model_from_jsonfile_stargan model_name="gan" File "/backup/Algorithm/xkzhang/codes/athena/athena/utils/checkpoint.py", line 45, in init super().init(kwargs, model=model) File "/8T_raid/xkzhang/venv_athena/lib/python3.5/site-packages/tensorflow_core/python/training/tracking/util.py", line 1779, in init % (v,)) ValueError: Checkpoint was expecting a trackable object (an object derived from TrackableBase), got gan.** If you believe this object should be trackable (i.e. it is part of the TensorFlow Python API and manages state), please open an issue. ]

xiaochunxin commented 4 years ago

The bug was caused by File "athena/stargan_main.py", line 105, in build_model_from_jsonfile_stargan model_name="gan". just delete the model_name="gan", and you can keep run the examples/vc/vcc2018/run.sh

I'm sorry for the trouble caused to you due to our negligence in work.

xinkez commented 4 years ago

Thank you. Now it works. Also two more questions, as I know there are 12 speakers in the vcc 2018 training dataset, why do you set the parameter to 9 in the file of ? and how do you split vcc 2018 dataset into , and ?

xiaochunxin commented 4 years ago

I fixed the bug.Just take a look at my updated code.

xinkez commented 4 years ago

Thank you. I run our your updated codes. Now the training is good, but when it come to dev stage, it output the errors in the following. Should the number of speakers in the train set be same with that of dev set? and how do you split vcc 2018 dataset into train, dev and test?

INFO:absl:>>>>> start evaluate in epoch 0 INFO:absl:hparams: [('cls', <class 'athena.data.datasets.voice_conversion.VoiceConversionDatasetBuilder'>), ('cmvn_file', 'examples/vc/vcc2018/data/cmvn'), ('codedsp_dim', 36), ('data_csv', 'examples/vc/vcc2018/data_numpy/dev.csv'), ('enable_load_from_disk', True), ('fft_size', 1024), ('fs', 16000), ('input_length_range', [10, 8000]), ('num_cmvn_workers', 1)] INFO:absl:Successfully load cmvn file examples/vc/vcc2018/data/cmvn INFO:absl:Loading data from examples/vc/vcc2018/data_numpy/dev.csv INFO:absl:please be patient, enable tf.function, it takes time ... Traceback (most recent call last): File "athena/stargan_main.py", line 179, in train(json_file, GanSolver, 1, 0) File "athena/stargan_main.py", line 158, in train loss_g, metrics_g = solver.evaluate(devset, epoch) File "athena/athena/solver.py", line 436, in evaluate total_loss, metrics = evaluate_step(samples) File "venv_athena/lib/python3.5/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in call result = self._call(*args, *kwds) File "venv_athena/lib/python3.5/site-packages/tensorflow_core/python/eager/def_function.py", line 524, in _call args, kwds) File "venv_athena/lib/python3.5/site-packages/tensorflow_core/python/eager/function.py", line 1650, in canonicalize_function_inputs self._flat_input_signature) File "venv_athena/lib/python3.5/site-packages/tensorflow_core/python/eager/function.py", line 1716, in _convert_inputs_to_signature format_error_message(inputs, input_signature)) ValueError: Python inputs incompatible with input_signature:**

xiaochunxin commented 4 years ago

The number of speakers in the train set must be same with that of dev set.

xinkez commented 4 years ago

@xiaochunxin With the training going on, the "loss", "metrics_d" and "metrics_g" will become nan, and these parameters cannot return to normal state. image

xiaochunxin commented 4 years ago

Are you using the corpus of VCC2018? If so, you can check out the patch in this pr( https://github.com/athena-team/athena/pull/302/files ). The optimizer parameters used before will lead to high learning rate in the training process, thus causing loss=NAN in some cases because the training of GAN is highly unstable.

xinkez commented 4 years ago

Yes, I'm using the corpus of VCC2018.

First I used all the 12 speakers to train the model, the loss is normal, but the generated wavs after training is not so good. So I tried to select only 2 speakers to train this model, the loss became nan as I mentioned before. Yesterday I checked out your patch in the pr (https://github.com/athena-team/athena/pull/302/files ), but it still became nan during training.

The two speakers I choose are VCC2SF1 and VCC2TF1.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.