alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.57k stars 1.06k forks source link

Assertion Failed in C++ Code #1323

Closed msj121 closed 1 year ago

msj121 commented 1 year ago

Hi I am using Vosk in a c++ application. I am getting the following error: ASSERTION_FAILED (VoskAPI: TraceBackBestPath():lattice-incremental-online-decoder.cc:127) Assertion Failed: (static_cast<size_t>(cur_t) < this->cost_offsets_.size())

Is there a way to track down this issue or determine the cause? I am new to the library and c++. Thanks.

nshmyrev commented 1 year ago

Sure, you need to provide an example (code and audio file) to reproduce the problem

msj121 commented 1 year ago

AH yes, thought I had posted it, sorry.

Header:


#ifndef WORDS_H
#define WORDS_H

#include <memory>
#include <thread>
#include <atomic>
#include <string>
#include <mutex>
#include <portaudio.h>

class Words {
public:
    Words();
    ~Words();

    void startListening(int deviceIndex = -1,const std::string& modelPath = "");
    // private:

    void stopListening();

private:
    void listeningThread(int deviceIndex = -1, const std::string& modelPath = "");
    static int audioCallback(const void *input, void *output,
                             unsigned long frameCount,
                             const PaStreamCallbackTimeInfo *timeInfo,
                             PaStreamCallbackFlags statusFlags,
                             void *userData);

    std::unique_ptr<std::thread> listeningThread_;
    std::atomic<bool> listening_{false};

};

#endif 

CPP file:

#include "Words.h"
#include <vosk_api.h>
#include <iostream>
#include <chrono>
#include <json.hpp>
#include <deque>
#include <utility>

#include <exception>

using json = nlohmann::json;

Words::Words() {
    PaError err = Pa_Initialize();
    if (err != paNoError) {
        std::cerr << "Error initializing PortAudio: " << Pa_GetErrorText(err) << std::endl;
        std::terminate();
    }
}

Words::~Words() {
    stopListening();
    Pa_Terminate();
}

void Words::startListening(int deviceIndex, const std::string& modelPath) {
    listening_ = true;
    listeningThread_ = std::make_unique<std::thread>(&WPMCalculator::listeningThread, this, deviceIndex, modelPath);
}

void Words::stopListening() {
    if (listeningThread_) {
        listening_ = false;
        listeningThread_->join();
        listeningThread_.reset();
    }
}

int Words::audioCallback(const void *input, void *output,
                                  unsigned long frameCount,
                                  const PaStreamCallbackTimeInfo *timeInfo,
                                  PaStreamCallbackFlags statusFlags,
                                  void *userData) {
    try{
        VoskRecognizer *recognizer = reinterpret_cast<VoskRecognizer *>(userData);
        if (vosk_recognizer_accept_waveform(recognizer, reinterpret_cast<const char *>(input), frameCount * sizeof(int16_t))) {
            const char *result = vosk_recognizer_result(recognizer);
            // std::cout << "Result: " << result << std::endl;
        } else {
            const char *partialResult = vosk_recognizer_partial_result(recognizer);
            // std::cout << "Partial result: " << partialResult << std::endl;
        }
    } catch (const std::exception& e) {
        std::cerr << "Error AudioCallBack: " << e.what() << std::endl;
    }
    return paContinue;
}

void Words::listeningThread(int deviceIndex, const std::string& modelPath) {
    // Initialize Vosk model and recognizer
    VoskModel *model = vosk_model_new(modelPath.c_str());
    VoskRecognizer *recognizer = vosk_recognizer_new(model, 16000.0);

    std::cout << "listeningThread" << std::endl;
    // Initialize PortAudio stream
    PaStream *stream;
    PaStreamParameters inputParameters;
    inputParameters.device = deviceIndex == -1 ? Pa_GetDefaultInputDevice() : deviceIndex;
    inputParameters.channelCount = 1; // mono input
    inputParameters.sampleFormat = paInt16;
    inputParameters.suggestedLatency = Pa_GetDeviceInfo(inputParameters.device)->defaultLowInputLatency;
    inputParameters.hostApiSpecificStreamInfo = nullptr;

    PaError err = Pa_OpenStream(&stream,
                                &inputParameters,
                                nullptr, // no output
                                16000,   // sample rate
                                320,     // frames per buffer
                                paClipOff,
                                audioCallback,
                                recognizer);
    if (err != paNoError) {
        std::cerr << "Error opening PortAudio stream: " << Pa_GetErrorText(err) << std::endl;
        return;
    }

    err = Pa_StartStream(stream);
    if (err != paNoError) {
        std::cerr << "Error starting PortAudio stream: " << Pa_GetErrorText(err) << std::endl;
        return;
    }

    while (listening_) {

        try {
            std::this_thread::sleep_for(std::chrono::milliseconds(100));

            const char *result = vosk_recognizer_partial_result(recognizer);

            // std::cout << "listening:\t" << result << std::endl;
            json j = json::parse(result);
            if (j.contains("partial")) {
                std::string partial_text = j["partial"];
                std::istringstream iss(partial_text);
                std::vector<std::string> words(std::istream_iterator<std::string>{iss}, std::istream_iterator<std::string>());
            }

        } catch (const std::exception& e) {
            std::cerr << "Error: " << e.what() << std::endl;
        }

    }

    err = Pa_StopStream(stream);
    if (err != paNoError) {
        std::cerr << "Error stopping PortAudio stream: " << Pa_GetErrorText(err) << std::endl;
    }

    err = Pa_CloseStream(stream);
    if (err != paNoError) {
        std::cerr << "Error closing PortAudio stream: " << Pa_GetErrorText(err) << std::endl;
    }

    vosk_recognizer_free(recognizer);
    vosk_model_free(model);
}

I also get a warning:

"WARNING (VoskAPI:BestPathEnd():lattice-incremental-online-decoder.cc:99) No final token found."

nshmyrev commented 1 year ago

You call partial_result from different threads, not very good idea. You'd better pass partial result from audio callback through variable or message queue.

msj121 commented 1 year ago

@nshmyrev Thank you so much! I knew I was missing something. Took a while to re-write to do what I needed and Threaded, my c++ is not as great. But that direction was all I needed. Thanks!