Open shiv6146 opened 5 years ago
Adding to the above steps is it required to run the step 4 also ? Since I can see that in run_lm_exp.sh
it is iterating over the checkpoints and then taking in the beam_dumps obtained from step 3 which itself is failing in my case. Could you please explain what is happening under the hood while performing these steps ?
Have you have any success with smaller beam widths? You can also try removing the --dump_all_beams_to=BEAMS.txt
parameter.
Step 2 uses a pretrained speech model to get the predicted character logits per time step for each sound file.
Step 3 uses the outputs of step 2 and first does greedy decoding, and then will run beam search decoding rescoring with the given n-gram LM. If you give a range of alpha and beta, it will run grid search over all values. Otherwise in most cases, it will just run beam search with the provided alpha and beta. It will also dump the most probable candidates for each sound file if dump_all_beams_to is provided.
Step 4 takes the candidates from step 3 and uses T-XL to rescore each candidate and outputs the most probable one (the most probable one is just a linear combination of the score from Step 3 and the output of T-XL).
@blisc I have tried smaller beam widths but still no luck :disappointed: I shall also try removing the --dump_all_beams_to
option and let you know. As per your explanation of the different steps, if I were to now train my model on a completely different dataset it does not matter during training time whether I have enabled ctc_decoder_with_lm
option in my model config. I shall still be able to do re-scoring using the above mentioned technique provided I have a trained model. Right?
You should not enable use_language_model
during training. All of the current decoder params are deprecated.
Your trained model will be able to make use of the re-scoring and new decoder script.
The new decoder script will make use of all available CPU cores, perhaps your instance is killing your script due to high CPU or CPU memory usage? An alternative would be to use a cloud instance for steps 2 and 4 on the cloud instance and step 3 on a local machine.
scripts/decode.py
doesn't use GPUs, it is CPU only. Insufficient system RAM might be the issue. Have you tried to decrease beam_width
to very small value and then increase it gradually (like 2, 8, 16, 64, 128)? Or maybe there was a problem with the decoder's installation (scripts/install_decoders.sh
).
Hi, Is there any update on this one? I am getting a similar issue when I run on 71k files of about 51 hours of data.
I am running on 2 GPU's. 2090P with 11Gb memory each.
A small hack which I did was to run the files in smaller batches which dint caused any issue with the same beam width.
Following up on this #380 @vsl9 @borisgin @blisc Have followed the steps 1-3 here and everything went smooth. But launching
python scripts/decode.py --logits=<path_to>/model_output.pickle --labels=<path_to>/librivox-train-clean-100.csv --lm=<path_to>/4-gram.binary --vocab=open_seq2seq/test_utils/toy_speech_data/vocab.txt --alpha=2.0 --beta=1.5 --beam_width=512 --dump_all_beams_to=<path_to>/beams.txt
with respective alpha beta and beam_width values just kills the script instantly for some unknown reason. My output looks something like this:# empty preds: 0 Greedy WER = 0.0025 Killed
after running the script multiple times in eval mode. Do you think thebeam_width
value used here is an overkill for my memory? I am running on a single GPU instance of 12 GB memory. Your thoughts on this will help me. Thank you :+1: