Closed ghost closed 4 years ago
If you are trying to get text for many audio files the best option would be to use infer mode. This will generate a text file containing all of the transcriptions. If you need to use interactive infer look through the sparse_tensor_to_chars function and infer function of speech2text.py and the get_interactive_infer function of utils.py
I tried with both infer and interactive_infer mode and with one or several wave files. But it gives only probability distribution, not real transcription. I think I'm wrong with decoder setting. This is infer_param what I set:
infer_params = { "data_layer": Speech2TextDataLayer, "data_layer_params": { "backend": "librosa", "num_audio_features": 96, "input_type": "spectrogram", "vocab_file": "open_seq2seq/test_utils/toy_speech_data/vocab.txt", "dataset_files": [ "data/test.csv", ], "shuffle": False, }, }
what was I wrong with?
Have you trained your own model or are you using the released Nvidia one? Which framework are you using? Jasper, DeepSpeech2 or Wave2Letter+? What are your decoder_params?
I trained my own model and used DeepSpeech2
For a similar model my parameters for interactive infer and infer are as follows ` infer_params = { "data_layer": Speech2TextDataLayer, "data_layer_params": { "dataset_files": [ "/ATC_DATA/ldc_test_clean.csv", ], "shuffle": False, }, }
interactive_infer_params = { "data_layer": Speech2TextDataLayer, "data_layer_params": { "num_audio_features": 64, "input_type": "spectrogram", "vocab_file": "./Resources/DeepSpeech2/vocab.txt", "dataset_files": [], "shuffle": False, }, } `
And my decoder params were like this ` "decoder": FullyConnectedCTCDecoder, "decoder_params": { "use_language_model": True,
# params for decoding the sequence with language model
"beam_width": 512,
"alpha": 2.0,
"beta": 1.0,
"decoder_library_path": "./resources/DeepSpeech2/Packages/libctc_decoder_with_kenlm.so",
"lm_path": "./resources/DeepSpeech2/lm/ds2-lm.binary",
"trie_path": "./resources/DeepSpeech2/lm/ds2-lm.trie",
"alphabet_config_path": "./resources/DeepSpeech2/vocab.txt",
}, "loss": CTCLoss, "loss_params": {}, `
If your parameters differ try changing them. If the output still doesn't work then without more information I am unsure about how to help you.
Thanks for your advance. I'll try with your reference.
When I run Interactive_infer script for speech2text, It gives float array, not text. How can I get text instead of? Who can help me this urgently?