with_streams when converting is slow

m4b commented 2 years ago

First off, awesome library :)

Is your feature request related to a problem? Please describe. So basically if I want to convert an i16 stereo to an f32 mono I might do:

let audio = fon::Audio::<fon::stereo::Stereo16>::with_i16_buffer(
    wave.header.sample_rate,
    samples.into_boxed_slice(),
);
let mut mono32 =
    fon::Audio::<fon::mono::Mono32>::with_stream(wave.header.sample_rate, &audio);
let slice = mono32.as_f32_slice();

assuming my samples is a Vec<i16>, the first part is fast (no copying afaics); the second part is not fast at all, and even in release, for e.g., an 8MB wav file, it takes ~200ms.

Describe the solution you'd like I dived into the implementations, in particular https://docs.rs/fon/latest/src/fon/audio.rs.html#115 but it isn't clear at all to me where what is being copied (there are 3 into calls that is hard to statically see what is going on). So I'm wondering if I were to pass my own storage if this would speed things up (i'm wondering if the allocation is slowing things down).

Describe alternatives you've considered Maybe I could preallocate an Audio f32 stream with silence, and then convert from i16 to f32 using this stream's storage? Not sure how to do this though.

Additional context I'm converting lots of i16 -> f32 like this, so that 200ms really adds up.

AldaronLau commented 2 years ago

@m4b Thanks for opening this issue!

First off, awesome library :)

Glad it's appreciated!

the first part is fast

Yeah it should be almost free to turn a Vec<T> into a Box<[T]>

i'm wondering if the allocation is slowing things down

I don't think it's allocation, because the allocation happens first in the implementation.

I'm converting lots of i16 -> f32 like this, so that 200ms really adds up.

This is a really common thing when using this library, so I'll look into making it faster and making some benchmarks.

AldaronLau commented 2 years ago

@m4b Comparison between version on crates.io and github version:

fon5 129005µs fon6 6995µs

It's about 18.4 times faster

See https://github.com/ardaku/fon/blob/stable/benchmarking/src/main.rs for the benchmarking code. Is that fast enough for your use case?

AldaronLau commented 2 years ago

@m4b Made a couple more changes, now it's more like 20 times faster than it was. I also just released fon 0.6.0 with the faster implementation so I'm going to close this issue, feel free to re-open if you think necessary.

m4b commented 2 years ago

I’ll take an 18x speed up any day of the week! 😁on phone now I can test the git version a little bit later but I suspect now it’ll just be noise for me, until I start loading loading hundreds of wave files.

and thanks for looking into this so promptly, again, very cool library :)

AldaronLau commented 2 years ago

@m4b No problem! BTW the git version now matches the current crates.io version so no need to check it out on GitHub, you can get it from crates.io.

m4b commented 2 years ago

Yea trying it out now; I forgot I was using twang for some programmatic synthesis, which uses fon 0.5, and I did stuff like this:

        let mut audio = Audio::<Ch32, 1>::with_silence(self.sample_rate.0, nsamples);
        audio.sink(..).stream(&mut synth);

which I think won't compile because there will be two versions of the crate now?

m4b commented 2 years ago

But yes, if i comment out this block just to test it out, i'm seeing also approximately a ~20x speedup in release (it's still quite slow in debug, like 800-900ms per file, but i can set this crates profile to release always as mitigation):

[2022-01-30T02:40:44Z INFO  audio::wav] Loaded bytes in 20.761145ms
[2022-01-30T02:40:45Z INFO  audio::wav] Converted to f32 slice in 17.355549ms with len=4618474
[2022-01-30T02:40:45Z INFO  audio::audio] Loaded file in 950.866767ms
[2022-01-30T02:40:45Z INFO  audio::wav] Loaded bytes in 6.902234ms
[2022-01-30T02:40:46Z INFO  audio::wav] Converted to f32 slice in 17.336849ms with len=4618474
[2022-01-30T02:40:46Z INFO  audio::audio] Loaded file in 932.00074ms
[2022-01-30T02:40:46Z INFO  audio::wav] Loaded bytes in 2.845697ms
[2022-01-30T02:40:47Z INFO  audio::wav] Converted to f32 slice in 11.691472ms with len=4618474
[2022-01-30T02:40:47Z INFO  audio::audio] Loaded file in 914.29774ms

it's a bit peculiar the remaining time there though, it looks like it's taking about 900-1second after the load function ends to drop something? I'd be surprised if temporary allocations had that long of a dealloc, but not sure what's going on there

AldaronLau commented 2 years ago

@m4b If you check out the working branch on twang it uses the new fon. I'm hoping to release a new version before I go to bed, but you can test with that in the mean-time.

I'm not sure how to help with the time issue without an example source code that reproduces the 900-1 second drop time.

m4b commented 2 years ago

So i'm playing around with:

wavv
oddio
fon
anyhow
log

with a simple loader, e.g.:

pub struct Wavs {
    wavs: HashMap<String, Arc<oddio::Frames<[f32; 2]>>>,
}

impl Wavs {
    /// Load the given .wav file at `path`
    pub fn load(&mut self, path: &Path) -> Result<(), anyhow::Error> {
        use std::time::Instant;
        let mut t = Instant::now();
        let bytes =
            fs::read(path).map_err(|e| anyhow!("Could not read file {}: {}", path.display(), e))?;
        info!("Loaded bytes in {:?}", t.elapsed());
        match Wave::from_bytes(&bytes) {
            Ok(wave) => {
                let samples = match wave.data {
                    Samples::BitDepth16(samples) => samples,
                    _ => return Err(anyhow!("Unsupported bit depth")),
                };
                let name = path.file_stem().unwrap().to_string_lossy().to_string();
                let audio = fon::Audio::<fon::stereo::Stereo16>::with_i16_buffer(
                    wave.header.sample_rate,
                    samples.into_boxed_slice(),
                );
                t = Instant::now();
                let mut stereof32 = fon::Audio::<fon::stereo::Stereo32>::with_stream(
                    wave.header.sample_rate,
                    &audio,
                );
                let slice = stereof32.as_f32_slice();
                info!(
                    "Converted to f32 slice in {:?} with len={}",
                    t.elapsed(),
                    slice.len()
                );
                let slice32: &[[f32; 2]] = unsafe {
                    std::slice::from_raw_parts(slice.as_ptr() as *const [f32; 2], slice.len() / 2)
                };
                let frames = oddio::Frames::from_slice(wave.header.sample_rate, slice32);
                self.wavs.insert(name, frames);
                Ok(())
            }
            Err(e) => Err(anyhow!("Could not load file {}: {:?}", path.display(), e)),
        }
    }
}

you can skip the oddio part I believe, it shouldn't be causing any issues, and you can just insert your fon audio stream into the hash map. Anyway, when i call std::time::Instant() before this function with some path to a 16-bit pcm microsoft wav file (audacity export will generate the correct ones that wavv likes), i see about a 1second delay after the function returns (I assume some drop logic is going off?)

The unsafe part I assume is safe, but i should assert len % 2 == 0, but otherwise i believe reinterpreting the stereo flattened output as [f32; 2] that oddio requires is safe.

AldaronLau commented 2 years ago

@m4b interesting, I don't immediately see what's wrong so I'll have to play around with it

m4b commented 2 years ago

so if you do something like:

let mut wavs = Wavs {
  wavs: Default::default(),
};
let t = std::time::Instant::now();
wavs.load("foo.wav")?;
info!("Loaded file in {:?}", t.elapsed());

maybe it'll repro? (and you should have similar logs to what i pasted above). the only difference is i call it in a wrapper struct that just forwards to the wavs.load_file(&str), e.g., looks exactly like this:

    /// Load the given sound at `path`
    pub fn load_sound(&mut self, path: &std::path::Path) -> Result<(), anyhow::Error> {
        let t = std::time::Instant::now();
        let res = self.sounds.load(path);
        info!("Loaded file in {:?}", t.elapsed());
        res
    }

m4b commented 2 years ago

Ok i've tested with twang 0.8, but i had to comment out a lot of stuff because the api changed so much, but for this issue, the load time seems significantly faster (minus the sometimes large load times I see after function return, but I'm not sure what's going on there, as I said, I think some drop logic is going off?); regardless i see a significant improvement:

[2022-02-05T01:14:29Z INFO  audio::wav] Loaded bytes in 7.173026ms
[2022-02-05T01:14:29Z INFO  audio::wav] Converted to f32 slice in 13.082279ms with len=4618474
[2022-02-05T01:14:29Z INFO  audio::audio] Loaded file in 69.610576ms
[2022-02-05T01:14:29Z INFO  audio::wav] Loaded bytes in 4.137279ms
[2022-02-05T01:14:29Z INFO  audio::wav] Converted to f32 slice in 13.030511ms with len=4618474
[2022-02-05T01:14:29Z INFO  audio::audio] Loaded file in 48.184147ms
[2022-02-05T01:14:29Z INFO  audio::wav] Loaded bytes in 2.465261ms
[2022-02-05T01:14:29Z INFO  audio::wav] Converted to f32 slice in 9.31302ms with len=4618474
[2022-02-05T01:14:29Z INFO  audio::audio] Loaded file in 35.102136ms

AldaronLau commented 2 years ago

@m4b Sorry about all the API changes! Interesting thing here is neither fon nor twang have any impl for Drop, so I'm guessing the extra time is either slow de-allocation or one of the other crates you are using has a slow drop. I haven't had a chance to see if I can reproduce it yet, as I was trying to get the twang update ready to publish.

m4b commented 2 years ago

it's ok, it's worth the faster speed :) the api seems nicer now with const generics anyway; mostly i was returning Signal in a kind of prototype function messing around, and not sure exactly what new analog is, but i can revisit it later.

AldaronLau commented 2 years ago

@m4b Glad it's nicer! In the new fon crate, the floating point channels can now go outside of the -1 to 1 range, which lets twang use a Ch32 directly rather than having it's own Signal type, so you can replace twang::Signal with fon::chan::Ch32.

ardaku / fon

with_streams when converting is slow #8