alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.22k stars 1.13k forks source link

Segmentation fault during fine-tuning VOSK #1043

Open RogerPonsBasetis opened 2 years ago

RogerPonsBasetis commented 2 years ago

Hello,

Following the vosk adaptation tutorial, I have re-compiled vosk ("vosk-model-en-us-0.22-compile") to implement some specific words in the model's vocabulary by adding sentences in the “extra.txt“ document. After this, I have changed the paths properly and I have run “compile-graph.sh“.

Once the re-compilation is finished (with no errors), I take the graph/ and rnnlm_out/ folders to change them in the models ”vosk-model-en-us-0.20” and “vosk-model-en-us-0.21“ separately (just to run the same process, but with differents models).

The issue appears when I try to use it in a notebook to check whether the model detects or not the new words. The lines are the following: Captura de 2022-06-27 13-02-42

Functions used in the code:

from vosk import Model, KaldiRecognizer

actual_path = os.getcwd()
voice_module_index = actual_path.find('voice-module') + len('voice-module')
actual_path = actual_path[:voice_module_index]
model_path = os.path.join(actual_path, self.opt['path'])
if not os.path.exists(model_path):
    print('''Please download the model from https://alphacephei.com/vosk/models     and unpack as 'model' in the current folder.''')
    raise Exception(f"No model was found in path: {model_path}")

return Model(model_path)

actual_path = os.getcwd()
voice_module_index = actual_path.find('voice-module') + len('voice-module')
actual_path = actual_path[:voice_module_index]
tmp_file_path = path2file
if not '.wav' in path2file:
    tmp_file_path = os.path.join(actual_path, 'tmp_audio.wav')
    start_subprocess = time.perf_counter()
    subprocess.call(['ffmpeg',                     '-i', path2file,                     '-ar', '16000',                     '-ac', '1',                     tmp_file_path,                     '-y'])
    print("SUBPROCESS TIME: ", time.perf_counter() - start_subprocess)
print('PATH: ', tmp_file_path)
wf = wave.open(tmp_file_path, "rb")

print("--Audio processing started--")
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
    print("Audio file must be WAV format mono PCM.")
    return Nonestart_model = time.perf_counter()

rec = KaldiRecognizer(self.model, wf.getframerate())  # , '[" coke diet regular", "[unk]"]')rec.SetWords(True)

while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break    if rec.AcceptWaveform(data):
        pass        # print(rec.Result())    else:
        pass        # print(rec.PartialResult())print("--Audio processing successfully finished!--")
# os.remove(tmp_file_path)res = self.postprocessing(rec.FinalResult())
print("MODEL TIME: ", time.perf_counter() - start_model)
return res

After running vosk.get_transcriptions(), the kernel dies produced by a segmentation fault.

nshmyrev commented 2 years ago

Hello. I wrote you on the kaldi-help. To recompile model 0.20 you need a compilation package for 0.20, we didn't release that package.