rnnoise before deepspeech

RustAudio / rnnoise-c

Rust bindings to Xiph's rnnoise denoising library

Other

7 stars 3 forks source link

rnnoise before deepspeech #3

Closed jeremyandrews closed 4 years ago

jeremyandrews commented 4 years ago

I'm trying to run rnnoise prior to converting audio to text with deepspeech. So far the only way I've made it happen successfully is to write out the denoised wav file, then reload that with deepspeech -- see the included example.

Can you recommend a cleaner way to do this? It seems I should be able to load the raw denoised audio into deepspeech, but I've not been able to make this work.

est31 commented 4 years ago

Can you recommend a cleaner way to do this

A cleaner way would be to write to a Vec or a std::io::Cursor instead of a temporary file. Even better would be to forego hound encoding completely of course.

jeremyandrews commented 4 years ago

Yeah, at a high level my goal was to somehow load the raw denoised audio into deepspeech without using hound, but I was unable to get that working. I keep running into apparent assumptions that the audio will be loaded from a file, which is why ultimately I wrote out to a temporary file to get it working.

I did not experiment with writing to a vector or cursor, I'll look into that.

est31 commented 4 years ago

So the input format for deepspeech is i16 samples at a sampling rate that depends on the model. Our output format is 32 bit floats at a sampling rate of 48 kHz. IIRC the deepspeech library does some resampling internally.

I'm closing this PR because I think it's better to file it towards the deepspech repo. That repo already contains a list of instructions how to obtain deepspeech models and binaries. The rnnoise model and binary on the other hand is built into the Rust crate.