Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the CODE_OF_CONDUCT.md file.
If you've found a bug, please provide the following information:
Describe the bug
When training in the v1.3.0 docker container, I am getting the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: No algorithm worked!
[[{{node tower_0/conv1d}}]]
[[tower_0/gradients/tower_0/MatMul_3_grad/tuple/control_dependency_1/_79]]
(1) Not found: No algorithm worked!
[[{{node tower_0/conv1d}}]]
0 successful operations.
0 derived errors ignored.
To Reproduce
Steps to reproduce the behavior:
Run the following command:
python3 -m coqui_stt_training.train \
--train_cudnn true \
--load_checkpoint_dir coqui-stt-1.3.0-checkpoint \
--save_checkpoint_dir checkpoint \
--auto_input_dataset s3/index.csv \
Expected behavior
Expected training to work. v1.4.0alpha1 works OK, but has other errors/issues and I would prefer to use the stable 1.3 version.
Environment (please complete the following information):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu Linux 20.04 (also tried with 22.04)
TensorFlow installed from (our builds, or upstream TensorFlow): Coqui train docker image v1.3.0
TensorFlow version (use command below):
Python version:
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: NVidia Driver 510.73.05, CUDA 11.6 (on host). Have also tried with driver version 470, but comes up with similar error with slightly different wording.
GPU model and memory: Geforce RTX 3080 10GB
Exact command to reproduce:
Additional context
Add any other context about the problem here.
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the CODE_OF_CONDUCT.md file.
If you've found a bug, please provide the following information:
Describe the bug When training in the v1.3.0 docker container, I am getting the following error: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found. (0) Not found: No algorithm worked! [[{{node tower_0/conv1d}}]] [[tower_0/gradients/tower_0/MatMul_3_grad/tuple/control_dependency_1/_79]] (1) Not found: No algorithm worked! [[{{node tower_0/conv1d}}]] 0 successful operations. 0 derived errors ignored.
To Reproduce Steps to reproduce the behavior:
Expected behavior Expected training to work. v1.4.0alpha1 works OK, but has other errors/issues and I would prefer to use the stable 1.3 version.
Environment (please complete the following information):
Additional context Add any other context about the problem here.