coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.27k stars 275 forks source link

Bug: Terminating: fork() called from a process already using GNU OpenMP, this is unsafe #2268

Closed Anlubi closed 2 years ago

Anlubi commented 2 years ago

I'm running into the follow problem using STT

If you've found a bug, please provide the following information:

Describe the bug Running the docker image on my computer raises the following error

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2022-07-19 17:53:57.933430: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-07-19 17:53:57.935519: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
I Enabling automatic mixed precision training.
I STARTING Optimization
I Training epoch 0...
epoch: 0
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.

Also pressing CTRL+C does not work. I have to kill it from a separate terminal.

To Reproduce I used the container from ghcr.io/coqui-ai/stt-train:main I tried running STT by

python -m coqui_stt_training.train \
    --train_files $CLIPS/train.csv \
    --alphabet_config_path=$CLIPS/alphabet.txt \
    --train_batch_size 128 \
    --n_hidden 2048 \
    --learning_rate 0.0001 \
    --dropout_rate 0.40 \
    --epochs 100 \
    --cache_for_epochs 10 \
    --log_level 1 \
    --checkpoint_dir ${OUT}/checkpoints \
    --summary_dir ${OUT}/tensorboard \
    --train_cudnn true \
    --automatic_mixed_precision true \
    --show_progressbar false \
    --skip_batch_test true 

Expected behavior A clear and concise description of what you expected to happen.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

Michal-Szczepaniak commented 2 years ago

I have the same. When i ran docker container i didn't experience it but when i ran it on the host i got it. I'm not using nvidia but i'm using opensuse tumbleweed

it just prints

(coqui-stt-train-venv) foidbgen@pc:~/Programs/coqui-train> python -m coqui_stt_training.train --checkpoint_dir coqui-stt-1.3.0-checkpoint --train_files train.csv --dev_files dev.csv --test_files test.csv
I --alphabet_config_path not specified, but found an alphabet file alongside specified checkpoint (coqui-stt-1.3.0-checkpoint/alphabet.txt). Will use this alphabet file for this run.
I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Loading best validating checkpoint from coqui-stt-1.3.0-checkpoint/best_dev-3663913
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                                                                                                                                                                                     Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.
Terminating: fork() called from a process already using GNU OpenMP, this is unsafe.

and does nothing and also ctrl+c does not work

NanoNabla commented 2 years ago

I ran into the same issue. It seems to be an issue with bmcfee/resampy#107

NanoNabla commented 2 years ago

Should be fixed in bmcfee/resampy#109 use resampy 0.4.0