GiviMAD / rustpotter

An open source wakeword spotter forged in rust
Apache License 2.0
88 stars 3 forks source link

Covert `&[i16]` to `&[u8]` #7

Closed zxcqirara closed 9 months ago

zxcqirara commented 9 months ago

I have audio data in &[i16], but process_bytes accepts only &[u8] (appeared after upgrading from 2.0.1 to 3.0.1)

zxcqirara commented 9 months ago

Now I have replaced it to rustpotter.process_samples(data.to_vec()) but now it don't recognize wake-word, but rustpotter cli do

GiviMAD commented 9 months ago

Sorry for the late reply, yes the process_samples method is what you need to use when you have the audio data already decoded as numbers.

If you are using int 16 samples you also need to set the config to:

  config.fmt.sample_rate = ...;
  config.fmt.sample_format = SampleFormat::I16;
  config.fmt.channels = ...;

And the size of the chunks provided to the process_samples method should be match the return of the rustpotter.get_samples_per_frame() if not they are ignored.

I think that is all you need to take into account.

zxcqirara commented 9 months ago

Yeah, I'll do it. Now I have the problem that rustpotter doesn't recognize my words (but rustpotter-cli do), I tried default settings and my own, but it didn't affect

zxcqirara commented 9 months ago

Sorry for the late reply, yes the process_samples method is what you need to use when you have the audio data already decoded as numbers.

If you are using int 16 samples you also need to set the config to:

  config.fmt.sample_rate = ...;
  config.fmt.sample_format = SampleFormat::I16;
  config.fmt.channels = ...;

And the size of the chunks provided to the process_samples method should be match the return of the rustpotter.get_samples_per_frame() if not they are ignored.

I think that is all you need to take into account.

I have done as you wrote, but it still doesn't recognize anything

zxcqirara commented 9 months ago

There is my config: image

There is my buffer: image

GiviMAD commented 9 months ago

I have no clue what can be wrong. You can try to verify if you are correctly feeding it by using the "record" feature and setting a low threshold, like 0.01. Or trying to create a wav file using hound as rustpotter does.

Also I suggest you to disable the audio filter until you get it to work.

listener:: rustpotter:: get_samples();

That is not the library method.

zxcqirara commented 9 months ago

I have no clue what can be wrong. You can try to verify if you are correctly feeding it by using the "record" feature and setting a low threshold, like 0.01. Or trying to create a wav file using hound as rustpotter does.

Also I suggest you to disable the audio filter until you get it to work.

listener:: rustpotter:: get_samples();

That is not the library method.

It is my function image

zxcqirara commented 9 months ago

And I can't see any files or even folder that i specified in record_path

GiviMAD commented 9 months ago

The record folder need to exists and be writable.

But if you don get any detections after lowering the threshold it will no record, it just takes records on partial detections.

zxcqirara commented 9 months ago

I got many short WAVs with strange flicking sound

zxcqirara commented 9 months ago

My detections in rustpotter-cli: image Works correctly (idk why I decided to blur it)

GiviMAD commented 9 months ago

I got many short WAVs with strange flicking sound

Then you should be doing something wrong with the audio data or the format is configured incorrectly. If you use the record option with the cli you will see the records are audibles.

zxcqirara commented 9 months ago

I have taken source from another project and just upgraded rustpotter from 2.0.1 to 3.0.1, in the previous version all was working correctly

GiviMAD commented 9 months ago

I have taken source from another project and just upgraded rustpotter from 2.0.1 to 3.0.1, in the previous version all was working correctly

If you want to send me the code diffs, maybe there is something I haven't got correctly, until now I haven't found any problems migrating the things I was using to the v3, but maybe I'm missing something.

zxcqirara commented 9 months ago

I have taken source from another project and just upgraded rustpotter from 2.0.1 to 3.0.1, in the previous version all was working correctly

If you want to send me the code diffs, maybe there is something I haven't got correctly, until now I haven't found any problems migrating the things I was using to the v3, but maybe I'm missing something.

Ok, cargo.toml:

rustpotter = "2.0.0"

to

rustpotter = { git = "https://github.com/GiviMAD/rustpotter", features = ["record"] }

Rustpotter config:

pub const RUSTPOTTER_DEFAULT_CONFIG: Lazy<RustpotterConfig> = Lazy::new(|| {
    RustpotterConfig {
        fmt: WavFmt::default(),
        detector: DetectorConfig {
            avg_threshold: 0.,
            threshold: 0.5,
            min_scores: 15,
            score_mode: ScoreMode::Average,
            comparator_band_size: 5,
            comparator_ref: 0.22
        },
        filters: FiltersConfig {
            gain_normalizer: GainNormalizationConfig {
                enabled: true,
                gain_ref: None,
                min_gain: 0.7,
                max_gain: 1.0,
            },
            band_pass: BandPassConfig {
                enabled: true,
                low_cutoff: 80.,
                high_cutoff: 400.,
            }
        }
    }
});

to

pub const RUSTPOTTER_DEFAULT_CONFIG: Lazy<RustpotterConfig> = Lazy::new(|| {
    RustpotterConfig {
        fmt: AudioFmt::default(),
        detector: DetectorConfig {
            avg_threshold: 0.,
            threshold: 0.5,
            min_scores: 15,
            score_mode: ScoreMode::Average,
            eager: false,
            band_size: 5,
            score_ref: 0.22,
            vad_mode: None,
            record_path: Some(String::from("./recs")),
        },
        filters: FiltersConfig {
            gain_normalizer: GainNormalizationConfig {
                enabled: true,
                gain_ref: None,
                min_gain: 0.7,
                max_gain: 1.0,
            },
            band_pass: BandPassConfig {
                enabled: true,
                low_cutoff: 80.,
                high_cutoff: 400.,
            }
        }
    }
});

Rustpotter init:

pub fn init() -> Result<(), ()> {
    let rustpotter_config = config::RUSTPOTTER_DEFAULT_CONFIG;

    // create rustpotter instance
    match Rustpotter::new(&rustpotter_config) {
        Ok(mut rinstance) => {
            // success
            // wake word files list
            // @TODO. Make it configurable via GUI for custom user voice.
            let rustpotter_wake_word_files: [&str; 5] = [
                "rustpotter/jarvis-default.rpw",
                "rustpotter/jarvis-community-1.rpw",
                "rustpotter/jarvis-community-2.rpw",
                "rustpotter/jarvis-community-3.rpw",
                "rustpotter/jarvis-community-4.rpw",
                // "rustpotter/jarvis-community-5.rpw",
            ];

            // load wake word files
            for rpw in rustpotter_wake_word_files {
                rinstance.add_wakeword_from_file(rpw).unwrap();
            }

            // store
            RUSTPOTTER.set(Mutex::new(rinstance));
        },
        Err(msg) => {
            error!("Rustpotter failed to initialize.\nError details: {}", msg);

            return Err(());
        }
    }

    Ok(())
}

to

pub fn init() -> Result<(), ()> {
    let rustpotter_config = config::RUSTPOTTER_DEFAULT_CONFIG;

    // create rustpotter instance
    match Rustpotter::new(&rustpotter_config) {
        Ok(mut rinstance) => {
            // success
            // wake word files list
            rinstance.add_wakeword_from_file("first", "rustpotter/first.rpw").unwrap();
            rinstance.add_wakeword_from_file("second", "rustpotter/second.rpw").unwrap();
            rinstance.add_wakeword_from_file("third", "rustpotter/third.rpw").unwrap();

            // store
            RUSTPOTTER.set(Mutex::new(rinstance));
        },
        Err(msg) => {
            error!("Rustpotter failed to initialize.\nError details: {}", msg);

            return Err(());
        }
    }

    Ok(())
}

All the code took from this repo

GiviMAD commented 9 months ago

Well I see several problems there, the audio format was not correctly configured.

At least it should be (this assumes you are using 16000hz, single channel audio).

        fmt: AudioFmt {
            ..Default::default(),
            sample_format: rustpotter::SampleFormat::I16,
        },

Also the frame size should be correctly initialized (but I assume you already fixed that as you got it to record), or you can use some buffering solution like the one implemented in the rustpotter-cli (because I encounter problems setting the clap buffer size to the required value in some platforms, maybe already fixed).

GiviMAD commented 9 months ago

I was meaning this:

...
    buffer.extend_from_slice(data);
    while buffer.len() >= rustpotter_samples_per_frame {
        let detection = rustpotter.process_samples(
            buffer
                .drain(0..rustpotter_samples_per_frame)
                .as_slice()
                .into(),
        );
        print_detection(
            &*rustpotter,
            detection,
            partial_detection_counter,
            debug,
            debug_gain,
            get_time_string,
        );
    }
 ...

It's probably not the most efficient solution, there I'm pushing the audio data at end of a vector and then draining it until the data on it is less that the required chuck size, the buffer should be declared on a parent scope so it's reused between function calls as you do with the rustpotter instance. I think it can fit there so you don't need to change the general frame size.

zxcqirara commented 9 months ago

Omg, I have recreated the project and just edited what I wrote above and it worked... I think I broke sth while editing code last time

GiviMAD commented 9 months ago

Omg, I have recreated the project and just edited what I wrote above and it worked... I think I broke sth while editing code last time

Great to know you managed to make it work.

I encourage you to try to create a trained wakeword model to replace the v2 files you are currently used, it should provide a better functionality than the now called "wakeword references". I have one created with around 2000 samples (200 records of the wakeword + 1800 noise and silence records) and on my experience it work far better, over all in presence of small noises.

Edit: one tip, the record functionality is a great help in order to augment the dataset as the produced records matches the duration of the largest wakeword.

Best regards!