NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.55k stars 369 forks source link

External LM rescoring decode script is getting killed #447

Open shiv6146 opened 5 years ago

shiv6146 commented 5 years ago

Following up on this #380 @vsl9 @borisgin @blisc Have followed the steps 1-3 here and everything went smooth. But launching python scripts/decode.py --logits=<path_to>/model_output.pickle --labels=<path_to>/librivox-train-clean-100.csv --lm=<path_to>/4-gram.binary --vocab=open_seq2seq/test_utils/toy_speech_data/vocab.txt --alpha=2.0 --beta=1.5 --beam_width=512 --dump_all_beams_to=<path_to>/beams.txt with respective alpha beta and beam_width values just kills the script instantly for some unknown reason. My output looks something like this: # empty preds: 0 Greedy WER = 0.0025 Killed after running the script multiple times in eval mode. Do you think the beam_width value used here is an overkill for my memory? I am running on a single GPU instance of 12 GB memory. Your thoughts on this will help me. Thank you :+1:

shiv6146 commented 5 years ago

Adding to the above steps is it required to run the step 4 also ? Since I can see that in run_lm_exp.sh it is iterating over the checkpoints and then taking in the beam_dumps obtained from step 3 which itself is failing in my case. Could you please explain what is happening under the hood while performing these steps ?

blisc commented 5 years ago

Have you have any success with smaller beam widths? You can also try removing the --dump_all_beams_to=BEAMS.txt parameter.

Step 2 uses a pretrained speech model to get the predicted character logits per time step for each sound file.

Step 3 uses the outputs of step 2 and first does greedy decoding, and then will run beam search decoding rescoring with the given n-gram LM. If you give a range of alpha and beta, it will run grid search over all values. Otherwise in most cases, it will just run beam search with the provided alpha and beta. It will also dump the most probable candidates for each sound file if dump_all_beams_to is provided.

Step 4 takes the candidates from step 3 and uses T-XL to rescore each candidate and outputs the most probable one (the most probable one is just a linear combination of the score from Step 3 and the output of T-XL).

shiv6146 commented 5 years ago

@blisc I have tried smaller beam widths but still no luck :disappointed: I shall also try removing the --dump_all_beams_to option and let you know. As per your explanation of the different steps, if I were to now train my model on a completely different dataset it does not matter during training time whether I have enabled ctc_decoder_with_lm option in my model config. I shall still be able to do re-scoring using the above mentioned technique provided I have a trained model. Right?

blisc commented 5 years ago

You should not enable use_language_model during training. All of the current decoder params are deprecated. Your trained model will be able to make use of the re-scoring and new decoder script.

The new decoder script will make use of all available CPU cores, perhaps your instance is killing your script due to high CPU or CPU memory usage? An alternative would be to use a cloud instance for steps 2 and 4 on the cloud instance and step 3 on a local machine.

vsl9 commented 5 years ago

scripts/decode.py doesn't use GPUs, it is CPU only. Insufficient system RAM might be the issue. Have you tried to decrease beam_width to very small value and then increase it gradually (like 2, 8, 16, 64, 128)? Or maybe there was a problem with the decoder's installation (scripts/install_decoders.sh).

aayushkubb commented 5 years ago

Hi, Is there any update on this one? I am getting a similar issue when I run on 71k files of about 51 hours of data.

I am running on 2 GPU's. 2090P with 11Gb memory each.

aayushkubb commented 5 years ago

A small hack which I did was to run the files in smaller batches which dint caused any issue with the same beam width.