coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.27k stars 275 forks source link

Bug: ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(29,)' #2254

Closed FrontierDK closed 2 years ago

FrontierDK commented 2 years ago

Hi all :)

I am using commands which do work on a US english dataset, and I am now trying on a very small dataset (only 15 lines/wave files), which have Danish letters like æ, ø and å. I sync'ed with Github about a week ago.

When running this command: _python3 ~/Python-3.7.6/STT/lm_optimizer.py --alphabet_config_path ~/Python-3.7.6/STT/data/alphabet.txt --scorer_path ~/kenlm.scorer --test_files ~/talefiler/train.csv ~/talefiler/dev.csv --checkpoint_dir ~/coqui-stt-1.4.0-checkpoint --n_trials 6 --n_hidden 256 --lm_alpha_max 5 --lm_betamax 5

I get this error: _ValueError: Cannot feed value of shape (32,) for Tensor 'layer6/bias/Initializer/zeros:0', which has shape '(29,)'

Longer log here:

[I 2022-07-05 14:35:33,907] A new study created in memory with name: no-name-f467ce79-fc9c-48fb-bdc6-5cfcef882133
I Loading best validating checkpoint from /home/bruger/coqui-stt-1.4.0-checkpoint/best_dev-1017
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
[W 2022-07-05 14:35:35,909] Trial 0 failed because of the following error: ValueError("Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(29,)'")
Traceback (most recent call last):
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 39, in objective
    current_samples = evaluate([test_file], create_model)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/evaluate.py", line 99, in evaluate
    load_graph_for_evaluation(session)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 233, in load_graph_for_evaluation
    _load_or_init_impl(session, methods, allow_drop_layers=False, silent=silent)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 171, in _load_or_init_impl
    silent=silent,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 127, in _load_checkpoint
    load_cudnn=Config.load_cudnn,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 91, in _load_checkpoint_impl
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(29,)'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 97, in <module>
    main()
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 86, in main
    results = compute_lm_optimization()
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 59, in compute_lm_optimization
    study.optimize(objective, n_jobs=1, n_trials=Config.n_trials)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/study.py", line 409, in optimize
    show_progress_bar=show_progress_bar,
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 76, in _optimize
    progress_bar=progress_bar,
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 264, in _run_trial
    raise func_err
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 39, in objective
    current_samples = evaluate([test_file], create_model)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/evaluate.py", line 99, in evaluate
    load_graph_for_evaluation(session)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 233, in load_graph_for_evaluation
    _load_or_init_impl(session, methods, allow_drop_layers=False, silent=silent)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 171, in _load_or_init_impl
    silent=silent,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 127, in _load_checkpoint
    load_cudnn=Config.load_cudnn,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 91, in _load_checkpoint_impl
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(29,)'
HarikalarKutusu commented 2 years ago

Seems like an alphabet size error between the alphabet fed and the model. Are you sure you transfer-trained Danish and used the same (Danish) alphabet everywhere? You should either do that or "normalize" your Danish commands to only include the English alphabet (e.g. use "a" instead of "æ").

FrontierDK commented 2 years ago

Omitting the Danish letters isn't an option, as the SR will be used by random web visitors.

I have tried removing the alphabet.txt and using this, I get a new error:

I enter _python -m coqui_stt_training.util.lm_optimize --scorer_path ~/kenlm.scorer --auto_input_dataset ~/talefiler/sample.csv --checkpoint_dir ~/coqui-stt-1.4.0-checkpoint --n_trials 6 --n_hidden 256 --lm_alpha_max 1 --lm_betamax 2

I get ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(21,)'

The long log...

(coqui-stt-train-venv) bruger@kubuntudk1:~$ python -m coqui_stt_training.util.lm_optimize --scorer_path ~/kenlm.scorer --auto_input_dataset ~/talefiler/sample.csv --checkpoint_dir ~/coqui-stt-1.4.0-checkpoint --n_trials 6 --n_hidden 256 --lm_alpha_max 1 --lm_beta_max 2 > result.txt
[I 2022-07-06 09:58:59,594] A new study created in memory with name: no-name-557bbd71-6245-4c19-bf4a-e11f0f9da1f5
[W 2022-07-06 09:59:01,692] Trial 0 failed because of the following error: ValueError("Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(21,)'")
Traceback (most recent call last):
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 39, in objective
    current_samples = evaluate([test_file], create_model)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/evaluate.py", line 99, in evaluate
    load_graph_for_evaluation(session)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 233, in load_graph_for_evaluation
    _load_or_init_impl(session, methods, allow_drop_layers=False, silent=silent)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 171, in _load_or_init_impl
    silent=silent,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 127, in _load_checkpoint
    load_cudnn=Config.load_cudnn,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 91, in _load_checkpoint_impl
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(21,)'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 97, in <module>
    main()
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 86, in main
    results = compute_lm_optimization()
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 59, in compute_lm_optimization
    study.optimize(objective, n_jobs=1, n_trials=Config.n_trials)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/study.py", line 409, in optimize
    show_progress_bar=show_progress_bar,
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 76, in _optimize
    progress_bar=progress_bar,
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    trial = _run_trial(study, func, catch)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 264, in _run_trial
    raise func_err
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/lm_optimize.py", line 39, in objective
    current_samples = evaluate([test_file], create_model)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/evaluate.py", line 99, in evaluate
    load_graph_for_evaluation(session)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 233, in load_graph_for_evaluation
    _load_or_init_impl(session, methods, allow_drop_layers=False, silent=silent)
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 171, in _load_or_init_impl
    silent=silent,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 127, in _load_checkpoint
    load_cudnn=Config.load_cudnn,
  File "/home/bruger/Python-3.7.6/STT/training/coqui_stt_training/util/checkpoints.py", line 91, in _load_checkpoint_impl
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/bruger/Python-3.7.6/coqui-stt-train-venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(21,)'
HarikalarKutusu commented 2 years ago

Please read this: https://stt.readthedocs.io/en/latest/TRANSFER_LEARNING.html

FrontierDK commented 2 years ago

Since this is my first Danish model, I can't use transfer learning.

Is there a way to have the bug fixed? I get that it's hard to fix a bug where one can't test, so here is my very simple dataset: training

HarikalarKutusu commented 2 years ago

Sorry, but this is not a bug, this is how deep learning models work. As indicated in the aforementioned resource, the alphabet is the last layer, your result, and (repeating from that resource) it is "crucial". Think of a model which is trained on English "yes" and "no" and you say "सेब" ("apple" in Hindi)...

You cannot expect an English model to understand every language. For example, the whole Common Voice project is dedicated to this.

Every character in the (latin) alphabet (or group of characters) is spoken differently in other languages. Compare "ei" vs "ie" in English and German for example. Therefore, for a new language, you either generate a model or find an already generated one.

I urge you to generate a Danish model...

HarikalarKutusu commented 2 years ago

Since this is my first Danish model, I can't use transfer learning.

No, you will transfer learn from the available English model using e.g. Danish dataset in Common Voice. Read the documents, then come to matrix, everybody will help you.

FrontierDK commented 2 years ago

I urge you to generate a Danish model...

But how do I create the first model? Seems like an egg-and-chicken problem... I need a Danish model to train a Danish model. Somehow, some one created the first...

HarikalarKutusu commented 2 years ago

Transfer Learning in Voice AI (STT) is defined as "transfering the knowledge gained from another language to your language in hand". As explained in the document I shared, if you have the same alphabet, you "fine-tune", otherwise, you drop the last 2-3 layers from the model and teach what is different in you new language.

Resources for you:

  1. The STT documentation
  2. This part of my video: https://youtu.be/VsUkiqS0xIg?t=1163
  3. My example for Turkish for Colab: https://github.com/HarikalarKutusu/common-voice-tr-experiments/blob/main/colab-notebooks/tr/v8.0/r1/stt-train-cv-tr-v8.0-r1.ipynb
  4. Check them then come to the gitter (or matrix) channel.
HarikalarKutusu commented 2 years ago

Also this one of course: https://github.com/coqui-ai/STT/blob/main/notebooks/easy_transfer_learning.ipynb

FrontierDK commented 2 years ago

HarikalarKutusu, you have made a very good video - thank you for sharing it :)

Specifying different loading and saving folders kinda work, but I get this warning: _WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing phase has been disable to prevent unexpected behavior of testing on the base checkpoint rather than the trained one. You should train and evaluate in two separate commands, specifying the correct --load_checkpointdir in both cases.

HarikalarKutusu commented 2 years ago

but you are running training and testing in a single invocation.

Yes, there is such an issue in transfer learning. Therefore you need to first train, then evaluate separately. Please check the notebook I shared (third point above) for .train and .evaluate calls.

Basically, when you don't give the test set to .train, so it only trains. This way that warning goes away...

FrontierDK commented 2 years ago

It seems this kinda has the same bug. I can do the training (epoch 300), but when I run the evaluate process...I get the same error:

I enter _python3 -m coqui_stt_training.evaluate --show_progressbar true --train_cudnn false --test_files ~/danish/test.csv --checkpoint_dir ~/coqui-stt-1.4.0-checkpoint --alphabet_configpath ~/danish/alphabet.txt

I get _ValueError: Cannot feed value of shape (1024,) for Tensor 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstmcell/bias/Initializer/Const:0', which has shape '(8192,)'

HarikalarKutusu commented 2 years ago

What was n_hidden during training? Please read:

https://stt.readthedocs.io/en/latest/TRANSFER_LEARNING.html?highlight=transfer%20learning#bootstrapping-from-coqui-stt-release-checkpoints

FrontierDK commented 2 years ago

It was 2048 - like the original minimalistic checkpoint from Coqui. If I try entering any other value, I get this warning: _W WARNING: --nhidden value (256) is different from value found in checkpoint (2048).

I can export and use the model for recognition, albeit with a high WERR (more than 20%).

HarikalarKutusu commented 2 years ago

albeit with a high WERR (more than 20%).

This seems normal to me. It depends on the amount of the data in the dataset and its quality.

Glad that you get it working. When you solved the problems, perhaps remove "bug" tag and close this?

FrontierDK commented 2 years ago

HarikalarKutusu, I'll do that. I have come around the bug by letting Coqui create an aphabet.txt on it's own too, and then use that. On a small dataset, it doesn't use all the letters - and using a full alphabet seems to crash Coqui. But working around it, I have been able to create working scorers + models, not even using transfer-train. This + using low n_hidden value got me down to 0.5% error-rate.

Thank you very much for your help, your video is very well made and I appreciate your help.