Lots of noise in downsampling from 48KHz to 16KHz

overhacked commented 4 years ago

I'm downsampling a 48KHz sample buffer, generated from a hardware input using the cpal crate, to 16KHz for speech-to-text via deepspeech-rs, and I'm getting a lot of noise in the output audio (written out to .WAV via hound). Noisy recording attached: recorded.wav.gz.

I've tested to make sure that the noise is introduced by Converter. If I save out the source audio at 48KHz/f32 or non-resampled at 48KHz/i16 there are no noise artifacts, but 16KHz/f32 or 16KHz/i16 sounds like there's a lot of rounding errors (?). I don't have a lot of experience with DSP, so please be patient with my imprecise explanations.

I have resampled the source audio using Audacity and sox from 48KHz to 16KHz, and neither introduce noise, just the expected decrease in fidelity with reduced sample rate.

Here is a short section of the overall code. It's the callback function given to cpal::device.build_input_stream() that does all the work:

// ... callback passed to cpal like this (much setup code omitted)
let input_sample_rate = config.sample_rate();
device.build_input_stream(
    &config.into(),
    move |data, _| {
        write_resampled_input_data(data, input_sample_rate, &audio_buf_for_input_stream, &writer_2);
    }
);

const SAMPLE_RATE: u32 = 16_000;
type BufHandle = Arc<Mutex<Option<Vec<i16>>>>;
type WavWriterHandle = Arc<Mutex<Option<hound::WavWriter<BufWriter<File>>>>>;

fn write_resampled_input_data(input: &[f32], rate: cpal::SampleRate, audio_buf: &BufHandle, writer: &WavWriterHandle)
{
    let samples = input.iter().copied().map(|i| f64::from_sample(i));
    let interpolator = Linear::new([0.0f64], [1.0]);
    let conv = Converter::from_hz_to_hz(
        from_interleaved_samples_iter::<_, [f64; 1]>(samples),
        interpolator,
        rate.0.into(),
        SAMPLE_RATE.into()
    );
    if let Ok(mut guard) = audio_buf.try_lock() {
        if let Some(audio_buf) = guard.as_mut() {
            for sample in conv.until_exhausted().map(|f| f[0]) {
                let sample = i16::from_sample(sample);
                audio_buf.push(sample);
                // WRITE after resampling
                if let Ok(mut guard) = writer.try_lock() {
                    if let Some(writer) = guard.as_mut() {
                        writer.write_sample(sample).ok();
                    }
                }
            }
        }
    }
}

overhacked commented 4 years ago

I discovered that I was abusing Converter by trying to resample each chunk streamed from cpal separately, when I needed to resample the entire recording (or use a streaming resampler that keeps state).

tripulse commented 4 years ago

State of what?

overhacked commented 4 years ago

The state of the interpolation. The output of the algorithm is affected by the previous samples, so if I feed it just a short array of samples each invocation, then I’m restarting the resampling algorithm each time. Streaming resample algorithms basically are a reduce function that keeps a moving average (not really the mean, but the output of the chosen resample algorithm, e.g. linear or bicubic) and uses it to calculate the resampled output of the next set of samples.

tripulse commented 4 years ago

So basically, they work like a sliding window?

overhacked commented 4 years ago

A streaming resampler does, I think, but I’m no authority.

RustAudio / dasp

Lots of noise in downsampling from 48KHz to 16KHz #135