Add tests to compare requested read time to elapsed time and/or samplebuf length

JMurph2015 commented 6 years ago

So this was inspired by a problem in an LED effects program I'm working on in Julia. Basically, the largest bug with that program is that its CPU usage spikes to ~4-6x normal during and after silent times from the audio card. Since I've already checked 80+% of my code for type stability, I decided to look into what was coming out of PortAudio during times of intermittent silence. Let's just say things got stranger.

As you can see below, PortAudio.jl is not sampling for nearly as long as it should be in a lot of circumstances. I'm not sure why this is happening, but I could see a scenario where this would cause the havoc I'm seeing further up the stack. This is off Ubuntu 17.04 running on the default PortAudio device (pretty sure that ends up being pulse).

Any ideas about why PortAudio might be doing this would be much appreciated (or other thoughts on escalating CPU usage!).

julia> @time @sync begin test = read(stream, 5.0s) end
  2.346361 seconds (8.79 k allocations: 3.764 MiB)
220500-frame, 2-channel SampleBuf{Float32, 2}
5.0s sampled at 44100.0Hz
▄▆▆▁▁▁▁▁▁▁▁▁▁▁▃▆▆▁▁▁▆▄▁▆▂▆▅▃▃▁▃▆▆▆▆▁▁▁▆▂▁▁▁▁▁▁▁▁▁▁▆▅▆▂▁▁▆▅▆▆▄▆▄▃▆▆▃▁▁▁▆▃▁▁▁▅▆▆▆▂
▄▆▆▁▁▁▁▁▁▁▁▁▁▁▃▆▆▁▁▁▆▄▁▆▂▆▅▃▃▁▃▆▆▆▆▁▁▁▆▂▁▁▁▁▁▁▁▁▁▁▆▅▆▂▁▁▆▅▆▆▄▆▄▃▆▆▃▁▁▁▆▃▁▁▁▅▆▆▆▂

julia> @time @sync begin test = read(stream, 5.0*48000) end
ERROR: MethodError: no method matching read(::PortAudio.PortAudioSource{Float32}, ::Float64)
Closest candidates are:
  read(::AbstractString, ::Any...) at io.jl:160
  read(::IO, ::Any) at io.jl:528
  read(::RingBuffers.RingBuffer{T}, ::Any) where T at /home/murphyj/.julia/v0.6/RingBuffers/src/RingBuffers.jl:257
  ...
Stacktrace:
 [1] read(::PortAudio.PortAudioStream{Float32}, ::Float64) at /home/murphyj/.julia/v0.6/PortAudio/src/PortAudio.jl:231

julia> @time @sync begin test = read(stream, 5*48000) end
  2.291248 seconds (9.65 k allocations: 4.103 MiB)
240000-frame, 2-channel SampleBuf{Float32, 2}
5.442176870748299s sampled at 44100.0Hz
▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆▆▄▁▆▁▁▂▆▁▁▁▆▅▅▆▆▄▁▁▆▂▆▆▃▆▆▆▆▆▆▂▁▆▆▅▄▆▆▄▁▁▆▆▄▆▆▆▁▁▁▁▆▆▄▆▆▃▃▆▆
▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆▆▄▁▆▁▁▂▆▁▁▁▆▅▅▆▆▄▁▁▆▂▆▆▃▆▆▆▆▆▆▂▁▆▆▅▄▆▆▄▁▁▆▆▄▆▆▆▁▁▁▁▆▆▄▆▆▃▃▆▆

julia> @time begin @sync begin test = read(stream, 5*48000) end end
  5.320691 seconds (9.51 k allocations: 4.094 MiB)
240000-frame, 2-channel SampleBuf{Float32, 2}
5.442176870748299s sampled at 44100.0Hz
▅▆▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
▅▆▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

julia> @time begin @sync begin test = read(stream, 5.0s) end end
  4.728189 seconds (8.79 k allocations: 3.765 MiB, 0.15% gc time)
220500-frame, 2-channel SampleBuf{Float32, 2}
5.0s sampled at 44100.0Hz
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

julia> @time begin @sync begin test = read(stream, 44100*5) end end
  3.151310 seconds (8.74 k allocations: 3.762 MiB)
220500-frame, 2-channel SampleBuf{Float32, 2}
5.0s sampled at 44100.0Hz
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆▃▆▁▁▁▁▁▃▆▆▆▃▁▁▆▆▆▂▆▅▁▁▁▁▁▁▁▁▁▁▁▁▆▂▁▄▆▁▁▆▂▁▁▁▆▂▁▁▆▅▁
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆▃▆▁▁▁▁▁▃▆▆▆▃▁▁▆▆▆▂▆▅▁▁▁▁▁▁▁▁▁▁▁▁▆▂▁▄▆▁▁▆▂▁▁▁▆▂▁▁▆▅▁

julia> @time begin @sync begin test = read(stream, 48000*5) end end
  2.979866 seconds (9.50 k allocations: 4.094 MiB)
240000-frame, 2-channel SampleBuf{Float32, 2}
5.442176870748299s sampled at 44100.0Hz
▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆▆▆▁▆▁▆▆▃▁▁▁▁▁▁▁▆▆▆▄▂▄▆▆▃▃▁▁▁▁▁▆▆▆▆▆▁▁▁▄▆▁▆▆▄▁▁▁▁▆▃▃
▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆▆▆▁▆▁▆▆▃▁▁▁▁▁▁▁▆▆▆▄▂▄▆▆▃▃▁▁▁▁▁▆▆▆▆▆▁▁▁▄▆▁▆▆▄▁▁▁▁▆▃▃

julia>

ssfrr commented 6 years ago

Sorry for the delayed response to this issue, for some reason I didn't get (or missed) the email notification so I didn't know this was hanging out.

I have an idea for why you're getting reads that don't take the full 5 seconds, but it's weird that your reads are as short as they are. The underlying portaudio audio task pushes data into a ringbuffer until it's full, then it can't put any more data in. When you do a read, you're first pulling out whatever's in the ringbuffer, and then you start getting fresh data. What I need to implement is a mechanism to signal when the buffer overflows so that the next read doesn't get stale data.

This shouldn't be an issue if you're always reading from the stream, but if you open the stream and then don't do anything with it for a while, the next read will have some stale data at the beginning of it. You also get that data immediately, so the read doesn't take as long as you'd expect. Generally the buffer isn't 2seconds long though, unless you've changed the buffer size when you created the stream.

On my machine I get:

julia> using PortAudio, SampledSignals

julia> str = PortAudioStream()
PortAudio.PortAudioStream{Float32}
  Samplerate: 44100.0Hz
  Buffer Size: 4096 frames
  2 channel sink: "default"
  2 channel source: "default"

julia> @time read(str, 5s);
  4.686101 seconds (9.02 k allocations: 3.772 MiB, 0.10% gc time)

julia> @time read(str, 5s);
  4.860484 seconds (9.03 k allocations: 3.772 MiB)

But if I've recently done a read, the read takes the full 5 seconds:

julia> read(str, 1s); @time read(str, 5s);
  5.083124 seconds (9.04 k allocations: 3.772 MiB)

Can you give a little more guidance for reproducing the CPU usage issue you're seeing? In general there shouldn't be anything in PortAudio.jl that cares whether there's noise or silence coming into the stream, so the CPU usage shouldn't change. The only thing I could think about is some kind of denormal issue (sometimes doing math on very very small floating point values is slower than normal floats), but I think I need more info on what you're seeing to be able to debug it.

Also, if you're around MIT we could meet up in person if it's easier to reproduce on your machine.

ssfrr commented 6 years ago

just created issue #15 to track with I think is one of your issues. I'll keep this one open until we figure out whether this CPU thing is a PortAudio issue or not.

JMurph2015 commented 6 years ago

Hi! I'll take a look into it again with this new info. I am around MIT, but I'm pretty hosed until at least Thursday this week, so that's the timescale I'm working on. Thanks for the help!

JMurph2015 commented 6 years ago

@ssfrr I did some more digging and split my processing thread and sampling thread. So now all thread 1 does is IO (audio, CLI, UDP) and thread 2 does all of the touchy mathy processing. Same problem still shows up on thread 1, but I'll continue to try isolating the problem just in case it's one of those other IO's problem. Thanks!

bramtayl commented 3 years ago

Hi @JMurph2015 the package no longer uses RingBuffers and so #15 should no longer be an issue. I do however think it would be a great idea to add tests to compare requested read time to elapsed time similar to your tests above, so I'm changing the issue title. Let me know if this is would be sufficient.

JMurph2015 commented 2 years ago

Hi @bramtayl, sorry for the super late reply, I don't get to GitHub much these days. Yes I think that is a good idea, just as a sanity check, and shouldn't be terribly hard to implement (I think you can mine the data out of Julia's built in timing tools).

rob-luke commented 2 years ago

I still observe that the requested read and write times do not necessarily match with what actually occurs. This original strange behaviour is still present and I think will continue to confuse users. If its expected behaviour as @bramtayl implies (by suggesting to add tests rather than address), then I think that, in addition to tests, it should be clearly described in the documentation.

From my testing:

I observe that reading/writing to a sound card provides inaccurate timing when there has not been a recent read/write (1).
I observe that, even if a write has happened recently, writes less than 100 ms have inaccurate timing (2).

I am not sure how generalisable these findings are. And I suggest that the user needs to benchmark their own system for accuracy

Accurate timing of small writes (typically around 4s ms, but up to 10s of ms) is required for real time audio processing (my research field), which is not currently possible (if I am using the package correctly, hence I included my script below). Currently I am only using this software when 200 ms continuous writes are appropriate.

@bramtayl and @ssfrr know the underlying code best, I have no intuition if improving accuracy in the absence of recent read/writes has happened is possible, or if accurate timing can occur at faster frame rates.

Experiment (1): Results

Note that the `without sleeps` times are quite close o the expected duration, whereas the `with sleeps` is not. ```text ───────────────────────────────────────────────────────────────────────────────── Time Allocations ────────────────────── ─────────────────────── Tot / % measured: 399s / 100% 766MiB / 84.0% Section ncalls time %tot avg alloc %tot avg ───────────────────────────────────────────────────────────────────────────────── Writing with sleeps 5 108s 27.1% 21.6s 170MiB 26.4% 33.9MiB 5 second write 5 23.4s 5.90% 4.69s 38.8MiB 6.02% 7.75MiB 4 second write 5 18.5s 4.66% 3.71s 31.1MiB 4.84% 6.23MiB 3 second write 5 13.4s 3.38% 2.68s 23.5MiB 3.65% 4.70MiB 2 second write 5 8.51s 2.14% 1.70s 15.9MiB 2.47% 3.18MiB 1 second write 5 3.39s 0.85% 677ms 8.26MiB 1.28% 1.65MiB 0.75 second write 5 2.33s 0.59% 466ms 6.35MiB 0.99% 1.27MiB 0.5 second write 5 827ms 0.21% 165ms 4.44MiB 0.69% 910KiB 0.25 second write 5 6.71ms 0.00% 1.34ms 2.54MiB 0.39% 520KiB Writing without sleeps 5 106s 26.6% 21.2s 174MiB 27.1% 34.9MiB 5 second write 5 25.0s 6.28% 4.99s 38.8MiB 6.02% 7.75MiB 4 second write 5 20.1s 5.05% 4.01s 31.1MiB 4.84% 6.23MiB 3 second write 5 14.9s 3.76% 2.99s 23.5MiB 3.65% 4.70MiB 2 second write 5 10.0s 2.52% 2.01s 15.9MiB 2.47% 3.18MiB 1 second write 5 4.90s 1.23% 981ms 8.26MiB 1.28% 1.65MiB 0.75 second write 5 3.88s 0.98% 775ms 11.1MiB 1.72% 2.22MiB 0.5 second write 5 2.52s 0.63% 504ms 4.44MiB 0.69% 910KiB 0.25 second write 5 1.07s 0.27% 213ms 2.54MiB 0.39% 520KiB Reading without sleeps 5 96.0s 24.2% 19.2s 163MiB 25.4% 32.6MiB 5 second read 5 25.0s 6.28% 4.99s 38.9MiB 6.05% 7.78MiB 4 second read 5 20.1s 5.05% 4.01s 31.1MiB 4.84% 6.23MiB 3 second read 5 14.9s 3.76% 2.99s 23.3MiB 3.63% 4.67MiB 2 second read 5 10.0s 2.52% 2.00s 15.6MiB 2.42% 3.11MiB 1 second read 5 4.91s 1.24% 981ms 7.79MiB 1.21% 1.56MiB 0.75 second read 5 3.88s 0.98% 776ms 16.5MiB 2.56% 3.30MiB 0.5 second read 5 2.51s 0.63% 501ms 3.89MiB 0.61% 798KiB 0.25 second read 5 1.27s 0.32% 255ms 1.95MiB 0.30% 400KiB Reading with sleeps 5 87.7s 22.1% 17.5s 136MiB 21.2% 27.2MiB 5 second read 5 23.7s 5.96% 4.73s 38.9MiB 6.05% 7.78MiB 4 second read 5 18.3s 4.61% 3.67s 31.1MiB 4.84% 6.23MiB 3 second read 5 13.6s 3.43% 2.72s 23.3MiB 3.63% 4.67MiB 2 second read 5 8.51s 2.14% 1.70s 15.6MiB 2.42% 3.11MiB 1 second read 5 3.38s 0.85% 676ms 7.79MiB 1.21% 1.56MiB 0.75 second read 5 2.10s 0.53% 419ms 5.84MiB 0.91% 1.17MiB 0.5 second read 5 1.05s 0.26% 210ms 3.89MiB 0.61% 798KiB 0.25 second read 5 1.91ms 0.00% 381μs 1.95MiB 0.30% 400KiB ─────────────────────────────────────────────────────────────────────────────────% ```

Experiment (1): Code

```Julia using PortAudio, SampledSignals, TimerOutputs println(PortAudio.devices()) str = PortAudioStream(PortAudio.devices()[9]) str_samplerate = Int(str.sample_rate) read(str, str_samplerate); write(str, randn(str_samplerate).*0.3); to = TimerOutput(); function time_test() for n = 1:5 @timeit to "Reading without sleeps" begin read(str, str_samplerate * 3) @timeit to "1 second read" read(str, str_samplerate * 1) @timeit to "2 second read" read(str, str_samplerate * 2) @timeit to "3 second read" read(str, str_samplerate * 3) @timeit to "4 second read" read(str, str_samplerate * 4) @timeit to "5 second read" read(str, str_samplerate * 5) @timeit to "0.75 second read" read(str, Int(str_samplerate * 0.75)) @timeit to "0.5 second read" read(str, Int(str_samplerate * 0.5)) @timeit to "0.25 second read" read(str, Int(str_samplerate * 0.25)) end @timeit to "Reading with sleeps" begin read(str, str_samplerate * 1) sleep(0.3) @timeit to "1 second read" read(str, str_samplerate * 1) sleep(0.3) @timeit to "2 second read" read(str, str_samplerate * 2) sleep(0.3) @timeit to "3 second read" read(str, str_samplerate * 3) sleep(0.3) @timeit to "4 second read" read(str, str_samplerate * 4) sleep(0.3) @timeit to "5 second read" read(str, str_samplerate * 5) sleep(0.3) @timeit to "0.75 second read" read(str, Int(str_samplerate * 0.75)) sleep(0.3) @timeit to "0.5 second read" read(str, Int(str_samplerate * 0.5)) sleep(0.3) @timeit to "0.25 second read" read(str, Int(str_samplerate * 0.25)) end @timeit to "Writing without sleeps" begin write(str, randn(str_samplerate * 5) .* 0.1) @timeit to "1 second write" write(str, randn(str_samplerate * 1) .* 0.1) @timeit to "2 second write" write(str, randn(str_samplerate * 2) .* 0.1) @timeit to "3 second write" write(str, randn(str_samplerate * 3) .* 0.1) @timeit to "4 second write" write(str, randn(str_samplerate * 4) .* 0.1) @timeit to "5 second write" write(str, randn(str_samplerate * 5) .* 0.1) @timeit to "0.75 second write" write(str, randn(Int(str_samplerate * 0.75)) .* 0.1) @timeit to "0.5 second write" write(str, randn(Int(str_samplerate * 0.5)) .* 0.1) @timeit to "0.25 second write" write(str, randn(Int(str_samplerate * 0.25)) .* 0.1) end @timeit to "Writing with sleeps" begin write(str, randn(str_samplerate * 5) .* 0.1) sleep(0.3) @timeit to "1 second write" write(str, randn(str_samplerate * 1) .* 0.1) sleep(0.3) @timeit to "2 second write" write(str, randn(str_samplerate * 2) .* 0.1) sleep(0.3) @timeit to "3 second write" write(str, randn(str_samplerate * 3) .* 0.1) sleep(0.3) @timeit to "4 second write" write(str, randn(str_samplerate * 4) .* 0.1) sleep(0.3) @timeit to "5 second write" write(str, randn(str_samplerate * 5) .* 0.1) sleep(0.3) @timeit to "0.75 second write" write(str, randn(Int(str_samplerate * 0.75)) .* 0.1) sleep(0.3) @timeit to "0.5 second write" write(str, randn(Int(str_samplerate * 0.5)) .* 0.1) sleep(0.3) @timeit to "0.25 second write" write(str, randn(Int(str_samplerate * 0.25)) .* 0.1) end end end time_test() show(to) ```

Experiment (2): Results

```text ───────────────────────────────────────────────────────────────────────────────── Time Allocations ────────────────────── ─────────────────────── Tot / % measured: 373s / 97.6% 700MiB / 83.8% Section ncalls time %tot avg alloc %tot avg ───────────────────────────────────────────────────────────────────────────────── Writing with sleeps 650 230s 63.1% 353ms 294MiB 50.2% 464KiB 0.75 second write 50 22.3s 6.12% 445ms 63.5MiB 10.8% 1.27MiB 0.5 second write 50 9.76s 2.68% 195ms 44.5MiB 7.58% 910KiB 0.3 second write 50 84.9ms 0.02% 1.70ms 29.3MiB 4.99% 600KiB 0.25 second write 50 69.2ms 0.02% 1.38ms 25.8MiB 4.40% 529KiB 0.2 second write 50 59.4ms 0.02% 1.19ms 22.0MiB 3.75% 451KiB 0.175 second write 50 53.0ms 0.01% 1.06ms 20.1MiB 3.43% 412KiB 0.15 second write 50 48.0ms 0.01% 959μs 18.2MiB 3.11% 373KiB 0.1 second write 50 41.6ms 0.01% 833μs 14.4MiB 2.45% 295KiB 0.125 second write 50 39.7ms 0.01% 794μs 16.3MiB 2.78% 334KiB 0.075 second write 50 27.3ms 0.01% 546μs 12.5MiB 2.13% 256KiB 0.05 second write 50 27.2ms 0.01% 545μs 10.6MiB 1.81% 217KiB 0.015 second write 50 19.6ms 0.01% 391μs 7.95MiB 1.36% 163KiB 0.025 second write 50 16.1ms 0.00% 321μs 8.70MiB 1.48% 178KiB Writing without sleeps 650 134s 36.9% 207ms 292MiB 49.8% 460KiB 0.75 second write 50 37.5s 10.3% 750ms 63.5MiB 10.8% 1.27MiB 0.5 second write 50 25.0s 6.87% 500ms 44.4MiB 7.58% 910KiB 0.3 second write 50 15.0s 4.13% 300ms 29.2MiB 4.98% 598KiB 0.25 second write 50 12.5s 3.43% 250ms 25.4MiB 4.33% 520KiB 0.2 second write 50 10.0s 2.75% 200ms 21.6MiB 3.68% 442KiB 0.175 second write 50 8.72s 2.40% 174ms 19.7MiB 3.36% 403KiB 0.15 second write 50 7.45s 2.05% 149ms 17.8MiB 3.03% 364KiB 0.125 second write 50 6.12s 1.68% 122ms 15.9MiB 2.70% 325KiB 0.1 second write 50 4.84s 1.33% 96.8ms 14.0MiB 2.38% 286KiB 0.075 second write 50 3.54s 0.97% 70.7ms 12.1MiB 2.06% 247KiB 0.05 second write 50 2.24s 0.62% 44.8ms 10.2MiB 1.73% 208KiB 0.025 second write 50 966ms 0.27% 19.3ms 8.25MiB 1.41% 169KiB 0.015 second write 50 393ms 0.11% 7.86ms 7.52MiB 1.28% 154KiB ─────────────────────────────────────────────────────────────────────────────────% ```

Experiment (2): Code

```Julia using PortAudio, SampledSignals, TimerOutputs println(PortAudio.devices()) str = PortAudioStream(PortAudio.devices()[9]) str_samplerate = Int(str.sample_rate) read(str, str_samplerate); write(str, randn(str_samplerate).*0.3); to = TimerOutput(); function time_test2() for time = [0.015, 0.025, 0.05, 0.075, 0.1, 0.125, 0.15, 0.175, 0.2, 0.25, 0.3, 0.5, 0.75] write(str, randn(Int(str_samplerate * time)) .* 0.1) for n = 1:50 @timeit to "Writing without sleeps" begin @timeit to "$time second write" write(str, randn(Int(str_samplerate * time)) .* 0.1) end end write(str, randn(Int(str_samplerate * time)) .* 0.1) sleep(0.3) for n = 1:50 @timeit to "Writing with sleeps" begin sleep(0.3) @timeit to "$time second write" write(str, randn(Int(str_samplerate * time)) .* 0.1) end end end end time_test2() show(to) ```

System details

```text _ _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 1.6.1 (2021-04-23) _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release |__/ | (YYYYYYY) pkg> st Status `/YYYYYYY/Project.toml` [717857b8] DSP v0.6.10 [31c24e10] Distributions v0.23.8 [28b8d3ca] GR v0.58.1 [b98c9c47] Pipe v1.3.0 [91a5bcdd] Plots v1.20.1 [80ea8bcb] PortAudio v1.1.2 [bd7594eb] SampledSignals v2.1.2 [a759f4b9] TimerOutputs v0.5.12 [1986cc42] Unitful v1.9.0 [42071c24] UnitfulRecipes v1.5.0 [8149f6b0] WAV v1.1.1 ```

bramtayl commented 2 years ago

Wow, that's some nice looking timer output! Sorry, I didn't mean it's not a problem, just that we should check since some of the suggestions are out of date. It might be nice to see the absolute difference between the actual time and the expected time. I think exact timing is probably a better question for the C library people, but by my estimation, I'd guess the difference shouldn't be too much more than compile time + latency + a few buffers. So, it the buffers are 120 frames and the sample rate is 44100 frames per second, that's 2 ms

bramtayl commented 2 years ago

Also, I'm not sure I really understand this:

Accurate timing of small writes (typically around 4s ms, but up to 10s of ms) is required for real time audio processing

Do you mean the time that sound is coming out of the speakers for, or the time that Julia is running the command for?

rob-luke commented 2 years ago

just that we should check since some of the suggestions are out of date.

Agreed, there was no accusation from me. Appreciate the great work you are doing keeping this package up to date.

Do you mean the time that sound is coming out of the speakers for, or the time that Julia is running the command for?

I think I was mixing things up here. So I've gone and done a few more checks, please correct any further misunderstandings on my behalf. In my test above I was assuming that if the @time command in Julia was reporting 150 ms, then the sound was coming out of the speaker for 150 ms. The point below summarise my current understanding of the timing situation (is this an accurate summary as you understand too?):

A write of any duration is correctly played from the speaker with accurate duration
The time reported by Julia will be incorrect if the duration is less than ~200 ms
The time reported by Julia will be incorrect if a signal has not been written recently
For signals with duration greater than 200 ms and continuous presentation the time reported by Julia may be accurate

The implication for users (and I understood was best practice anyway) would be:

Don't rely on the timing from Julia
- Both from PortAudio signals,
- or from inbuilt functions like sleep
Check your signals on a scope or loopback

Speaker Output Continuous Duration Tests

To run this test I used an RME Fireface ucx and looped back the output channel, and used audition to capture the loopback.

4 ms

![image](https://user-images.githubusercontent.com/748691/131199663-3fcb08c7-0b64-481f-be7d-541d98ba27e8.png)

10 ms

![image](https://user-images.githubusercontent.com/748691/131199553-ad5d2e48-a4d3-457d-a9c3-2cdfc7d1f987.png)

100 ms

![image](https://user-images.githubusercontent.com/748691/131198908-35eb2ac5-d882-44d6-99e5-dc76804f6c94.png)

1000 ms

![image](https://user-images.githubusercontent.com/748691/131199391-b573c06d-f974-422e-97ae-bb320c7aad9e.png)

Speaker Output Intermittent Duration Tests

To run this test I used an RME Fireface ucx and looped back the output channel, and used audition to capture the loopback.

Correct method

![image](https://user-images.githubusercontent.com/748691/131201417-70aa15f9-662d-4b7e-b8af-4d065ae40e37.png) ```Julia @time for n = 1:20 write(str, randn(Int(str_samplerate * 0.01)) .* 0.1) write(str, randn(Int(str_samplerate * 0.01)) .* 0.0) end ```

Incorrect method

![image](https://user-images.githubusercontent.com/748691/131201474-444247dd-ce02-470d-8ca1-1a880ad9011d.png) ```Julia @time for n = 1:20 write(str, randn(Int(str_samplerate * 0.01)) .* 0.1) sleep(0.01) end ```

rob-luke commented 2 years ago

So is it possible to improve the timing reported by Julia (not saying do it now, just is it likely to be feasible)? And/or should we inform users in the readme or (nice new) docs that the timing reported by the functions do not equate to the timing of the signal that comes from the speaker? I think the current behaviour is confusing to users as its intermittent, as sometimes the reported timing is accurate and sometimes it isn't.

JMurph2015 commented 2 years ago

Quickly reading through, this isn't the issue at hand either. The tests I am proposing are ones that validate requested_audio_time <= runtime_of_read and returned_audio_length == requested_audio_time. Basically, when I call read(device, 5s) I expect that function to a) take at least 5 seconds and b) return 5 seconds of audio to me even, and especially, if the signal is zeros. The sound card doesn't cease to exist because there is nothing audible playing. At least in theory (though strictly speaking in implementation this may not be true), the sound card is always playing something, but that something may be zeros, which would mean no sound.

rob-luke commented 2 years ago

Ops, somehow i got switched from reads to writes haha. Sorry to hijack @JMurph2015

rob-luke commented 2 years ago

But my comment and tests from yesterday confirm that the issue with read timing that @JMurph2015 reported in 2017 still exists.

bramtayl commented 2 years ago

Ok, so I've changed the issue title to reference both julia runtime and the size of the returned buffer, because those are different things.

is it possible to improve the timing reported by Julia

I think it depends on the reason for the difference. At least some of it would depend on exactly how the C library is passing buffers back and forth to the sound card, and I don't know the answer here.

the issue with read timing that @JMurph2015 reported in 2017 still exists.

On my end, at least the size of the buffer issue seems fixed.

julia> PortAudioStream() do stream
           @time read(stream, round(Int, 5 * stream.sample_rate))
       end
  5.003865 seconds (541.21 k allocations: 33.148 MiB, 0.21% gc time, 1.94% compilation time)
220500-frame, 2-channel SampleBuf{Float32, 2}
5.0s sampled at 44100.0Hz
▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂

ssfrr commented 2 years ago

@JMurph2015 said:

Basically, when I call read(device, 5s) I expect that function to a) take at least 5 seconds and b) return 5 seconds of audio to me

(b) should be true, but (a) is not. The reasons are complicated.

Clocks

Trying to nail down a definition of "now" that is well-defined and not surprising turns out to be very difficult when milliseconds count. There are basically two clocks running that are relevant:

the audio clock, ticking along at 1 sample every 1/48000th of a second (or whatever your sampling rate it). This clock consumes samples from the output buffer and plays them from the DAC, while simultaneously taking samples from the ADC and pushing them into the input buffer. (in reality this happens periodically in blocks rather than sample-by-sample, but by the time you're calling read or write that detail doesn't really matter).
The CPU's "wall clock", which is probably being queried when you call @timeit.

You can ask the CPU what time it is, but that doesn't tell you very much about what's coming out of the speakers right now. It could be the data that your code put into the buffer some time ago.

Reading

So the question is what should happen when you call read(str, 48000)? One option is:

The input buffer is wiped out
we start reading from the input buffer
we return after we get the requested amount of data

The wall-clock time elapsed during this read call would be pretty close to the requested time, and the beginning of the data would correspond to the moment that your code ran. This seems good, but what if you were to call buf1 = read(str, fs); buf2 = read(str, fs)?. You would generally want those two buffers to represent contiguous audio data, but that's not what you'll get from the sequence above. The samples that the ADC recorded between the end of the first call and the beginning of the second would be discarded.

So we don't generally want to wipe out the input buffer between read calls. In this case you want the 2nd read call to grab some samples that were actually recorded before it was called. So what that means is that if you run buf1 = read(str, fs); sleep(0.1); buf2 = read(str, fs), each call to read should give you 1 second of audio, but the second call actually executed in 0.9s, because immediately when it was called it grabbed the 0.1s of data that had accumulated during the sleep. If the requested amount is less than what's in the buffer it could return immediately.

So what happens if you wait a long time before calling read? We can't have that input buffer just growing without bound so you'd either want to have a ringbuffer that keeps some amount of the most recent audio, or you stop filling the buffer when it's full, and wipe it out the next time read is called. I'm actually not sure what the current behavior is.

There's a dual for writing to the output buffer - if the amount you write is less than the free space in the buffer, the write could return immediately. @rob-luke I think this explains your results - for longer writes you fill the buffer and then wait around until the last of the data is in the buffer. So if the buffer was full when you started, you'd expect the runtime to be requested_time - block_size. For very short writes you can actually write multiple times before filling the buffer, so the average runtime is very short indeed.

Models of Soundcard IO

There are a few different ways to think about interfacing with a sound card. The first two are both pretty typical of low-level audio APIs, and libportaudio supports both of them. It's important to note that they both assume that it's the user's job to keep the audio data flowing - if they don't read or write often enough then buffers overflow or underflow, which is considered an error condition.

Callback IO

The soundcard calls a user-supplied function for every block of audio. This callback is often called from a very high-priority context/thread that's totally separate from the main user code. The callback's job is to take the most recent block of recorded audio, process it, and supply a new block of audio to played from the speaker. Usually the callback just gets pointers to the buffers and they are mutated in-place. Most very-low-latency audio APIs use this model. PortAudio.jl used to, but it's not a very good fit for high-level languages like Julia where the GC or JIT compiler could cause pauses at any time, and at the time Julia did not support being called from other threads (not sure whether that's still true).

Blocking

The user reads and writes to buffers that are streaming to and from the sound card. If there is more data given[requested] than there is space[data] available, the user code blocks. This is how PortAudio.jl works, and described above.

Mixer

This would be more typical of a higher-level package, for instance a computer music system like Supercollider. The idea is that there's some underlying task that's keeping the audio device fed, and you can sporadically give it audio data to play. You could have multiple tasks that give audio to play simultaneously and it will be mixed together. This seems pretty convenient, but it has a few issues.

You can't build the lower-level API on top of this one
playback timing is still going to be inconsistent. For one, the exact time your process wakes up is subject to jitter. Also, remember that the processing is happening block-by-block. At any given time there is a block that the device is playing, and the earliest you can play is by putting data into the next block. So if the device happens to be near the end of the current block, your audio will play almost immediately, but if it is near the beginning of the current block, you need to wait for that block to finish before it gets to your data.

The common solution to the second issue is that when you want to play something, you also supply a timestamp for when you want it to be played. Then the lower-level audio task takes care of mixing in your audio at the right time. You accept some latency (audio needs to be scheduled for some time in the future) but you eliminate the jitter and unpredictability.

@samarron has thought very deeply about time, and his SonicPi project has some nice ideas that are described in in this paper: https://www.cs.kent.ac.uk/people/staff/dao7/publ/farm14-sonicpi.pdf

I suspect something with similar temporal semantics could be implemented in Julia, but it's out of scope for PortAudio.jl.

Summary

It's difficult to correlate timing of your julia code to timing of your audio more precisely than 10s of milliseconds best case, and likely more on the order of 100-200ms.

If you care about exact timing between audio events, create an audio buffer with the right spacing and feed it into the audio device as a contiguous chunk (though it could be split over multiple write calls). Then you are not subject to any scheduling issues due to Julia or the audio device buffer management. If you want something that can react to external stimuli (like a serial port message) with single-digit-ms precision, I suspect that's going to be very difficult to do with Julia and PortAudio.jl.

JMurph2015 commented 2 years ago

Ok, my general response is, yes you have a point with the intricacies of exactly how correlated you can make the system clock be with the audio clock, but no you miss the forest for the trees here.

In the context of PortAudio.jl, my program was just looping reading 1/30 of a second of audio. When there was something playing through the speakers, this worked reasonably consistently (though not entirely). When there was nothing playing, it totally bugged out and the blocking call would take dramatically less time than expected and return less data than expected, thereby spinning the CPU much more than necessary.

This is the problem that needs solving and is clearly inconsistent with the rest of the API. And it should be testable by ensuring the calls block for a minimum amount of time and when it does return, that it comes back with an appropriate amount of data. On that first point, I could understand the first call or two blocking for an unexpectedly short time, but in my case it was 10's of seconds worth of virtually non-blocking calls.

ssfrr commented 2 years ago

Ah yes, agreed that seems like a bug. I was mostly here responding to the expectation that a read/write of 5s should take 5s to execute, which I think is mostly what @rob-luke was talking about.

You mentioned:

it should be testable by ensuring the calls block for a minimum amount of time and when it does return, that it comes back with an appropriate amount of data

I agree with the 2nd point, that they should return the right amount of data, but I don't understand the first point. If there's 256ms of data in the read buffer and you're reading in 33ms chunks, the first 7 reads should not need to block at all. The 8th read should block for about 11ms, and then the buffer is empty so subsequent reads should block for about 33ms each.

edit: it will only be 33ms on average - the buffer will be filled in chunks that are probably larger than 33ms, so actual runtime of the read will be bursty as the reads catch up with each new chunk of data.

@bramtayl said:

On my end, at least the size of the buffer issue seems fixed.

Have you confirmed that the CPU usage issue still exists in recent PortAudio.jl?

bramtayl commented 2 years ago

Nope, sorry, not even sure if I would know how to test that

rob-luke commented 2 years ago

Thanks for the wonderful summary @ssfrr. I appreciate you taking the time to step through the details. Your comments have cleared up my understanding and align with the behaviour I see.

ssfrr commented 2 years ago

So AFAICT the original bug was that read wasn't returning the full requested amount of data, so @JMurph2015's code was spinning and burning up CPU. It seems that bug is resolved.

The issue of how much wall-clock time read and write should take is not a bug, but it does seem like the behavior is surprising to folks, so perhaps we just need to improve the docs of those functions to make things more clear?

JuliaAudio / PortAudio.jl