alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.38k stars 1.04k forks source link

Memory Leak - Rust Bindings to api #1407

Closed chriskyndrid closed 1 year ago

chriskyndrid commented 1 year ago

@nshmyrev, Per your request I'm opening another issue regarding the (presumed) memory leak referenced here. The rust bindings crate in use can be found here. Here is a sample Rust program that should reliably reproduce the issue:

1) Your mains.rs:


use std::path::Path;
use dasp::ring_buffer::Fixed;
use dasp::{Sample, Signal};
use dasp::signal::from_iter;
use dasp_interpolate::sinc::Sinc;
use hound::{WavReader};
use once_cell::sync::Lazy;
use rayon::prelude::*;
use vosk::{Model, Recognizer};

pub static VOSK_MODEL: Lazy<Model> = Lazy::new(|| {
    let model_path = "./include/vosk/model/gs";
    Model::new(model_path).expect("Failed to load vosk model.")
});

#[derive(Debug, Clone)]
pub struct RecognizedWord {
    pub start_time: f32,
    pub end_time: f32,
    pub word: String,
    pub confidence: f32,
}

fn main() {
    let sample = Path::new("./include/sample/sample.wav");
    let audio = read_wav_file(sample);

    loop{
        let _recognized_words = audio
            .par_chunks(32_000)
            .map(|chunk| {
                let mut recognizer = Recognizer::new(&*VOSK_MODEL, 16000.0).unwrap();
                recognizer.set_max_alternatives(0);
                recognizer.set_words(true);

                for sample in chunk.chunks(16_000) {
                    recognizer.accept_waveform(sample);
                }

                let result = recognizer.final_result().single();
                if let Some(crm) = result {
                    let words = crm.result;
                    let recognized_words: Vec<RecognizedWord> = words.into_iter().map(move |word| {
                        RecognizedWord {
                            start_time: word.start,
                            end_time: word.end,
                            word: word.word.to_owned(),
                            confidence: word.conf,
                        }
                    }).collect();
                    println!("words={:?}", recognized_words);
                    recognized_words
                } else {
                    Vec::new()
                }
            })
            .collect::<Vec<_>>();
    }

}

fn read_wav_file(path: &Path) -> Vec<i16> {
    let reader = WavReader::open(path).expect("unable to read file");
    let sample = reader
        .into_samples::<i16>()
        .map(|x| x.expect("reading"))
        .collect::<Vec<_>>();

    resample_sample(sample)
}

fn resample_sample(audio: Vec<i16>) -> Vec<i16> {
    let mut resampled_audio = Vec::new();
    let source_hz = 44100.0;
    let target_hz = 16000.0;
    let input_signal = from_iter(audio).map(|x| x.to_sample::<f64>());
    let sinc = Sinc::new(Fixed::from([0.0; 64]));
    let resampled_signal = input_signal.from_hz_to_hz(sinc, source_hz, target_hz);

    for sample in resampled_signal.until_exhausted() {
        resampled_audio.push(sample.to_sample::<i16>());
    }
    resampled_audio
}

Note the sample I used was in 44100 mono, hence the conversion.

2) In your Cargo.toml

[package]
name = "vosk_memory_leak"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

[dependencies.vosk]
version = "0.2"

[dependencies.rayon]
version = "1.7"

[dependencies.hound]
version = "3.5"

[dependencies.once_cell]
version = "1.17"

[dependencies.dasp]
version = "0.11"
features = ["all"]

[dependencies.dasp_interpolate]
version = "0.11"
features = ["all"]

3) You will need a directory with the model and the libvosk.so and vosk_api.h. In my case, include/libs and include/model

4) Build the program via:


RUSTFLAGS=-L./include/vosk/libs LD_LIBRARY_PATH=./include/vosk/libs \
cargo build --release

5) Run the program via:

RUSTFLAGS=-L./include/vosk/libs LD_LIBRARY_PATH=./include/vosk/libs heaptrack target/release/vosk_memory_leak

This sample program will run recognition in parallel using the Rayon crate, so the model will be shared across many threads. Each thread will fire up and create it's own Recognizer.

vosk_memory_leak

nshmyrev commented 1 year ago

And where do you see the leak here? I run this code, memory usage remains stable about 5Gb with en-us-0.22 model.

On the screenshot there is second const arpa raw in leaked column for example, it is not leaked, simply model memory. It should be like that.

First row is for RNNLM.

chriskyndrid commented 1 year ago

I accidentally posted an unsorted screenshot of the Bottom-Up view(it was sorted on Peak). The accept_waveform call was more interesting to me, but, it seems to stabilize around 40MB reported as a potential leak, regardless of duration I run. I'm using the vosk-model-en-us-0.42-gigaspeech model.

I do see a rise on my machine to about 10.3GB over period of 10 minutes with the sample program, after that it seems to stabilize in usage and I don't see significant changes with Heaptrack, or otherwise.

So I think I'm wrong, and my initial observations don't represent any issue as I believed. It's likely in my main program another library, like gstreamer, is the culprit. Thank you for your time and feedback, and I apologize for any wasted time on the issue.

I'll go ahead and close it.

nshmyrev commented 1 year ago

@chriskyndrid Ok, thank you for your report anyway, let us know how it goes