coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.27k stars 275 forks source link

Bug: #2237

Closed sjpritchard closed 2 years ago

sjpritchard commented 2 years ago

Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!

This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the CODE_OF_CONDUCT.md file.

If you've found a bug, please provide the following information:

Describe the bug When training in the v1.3.0 docker container, I am getting the following error: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: No algorithm worked! [[{{node tower_0/conv1d}}]] [[tower_0/gradients/tower_0/MatMul_3_grad/tuple/control_dependency_1/_79]] (1) Not found: No algorithm worked! [[{{node tower_0/conv1d}}]] 0 successful operations. 0 derived errors ignored.

To Reproduce Steps to reproduce the behavior:

  1. Run the following command: python3 -m coqui_stt_training.train \ --train_cudnn true \ --load_checkpoint_dir coqui-stt-1.3.0-checkpoint \ --save_checkpoint_dir checkpoint \ --auto_input_dataset s3/index.csv \

Expected behavior Expected training to work. v1.4.0alpha1 works OK, but has other errors/issues and I would prefer to use the stable 1.3 version.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

ChamathKB commented 2 years ago

Hope this would help https://stt.readthedocs.io/en/latest/playbook/TRAINING.html#possible-errors

wasertech commented 2 years ago

I’ll close this as:

  1. It has no name
  2. @ChamathKB answered the query
  3. The op has not responded