Open JMurph2015 opened 6 years ago
Sorry for the delayed response to this issue, for some reason I didn't get (or missed) the email notification so I didn't know this was hanging out.
I have an idea for why you're getting reads that don't take the full 5 seconds, but it's weird that your reads are as short as they are. The underlying portaudio audio task pushes data into a ringbuffer until it's full, then it can't put any more data in. When you do a read, you're first pulling out whatever's in the ringbuffer, and then you start getting fresh data. What I need to implement is a mechanism to signal when the buffer overflows so that the next read doesn't get stale data.
This shouldn't be an issue if you're always reading from the stream, but if you open the stream and then don't do anything with it for a while, the next read will have some stale data at the beginning of it. You also get that data immediately, so the read doesn't take as long as you'd expect. Generally the buffer isn't 2seconds long though, unless you've changed the buffer size when you created the stream.
On my machine I get:
julia> using PortAudio, SampledSignals
julia> str = PortAudioStream()
PortAudio.PortAudioStream{Float32}
Samplerate: 44100.0Hz
Buffer Size: 4096 frames
2 channel sink: "default"
2 channel source: "default"
julia> @time read(str, 5s);
4.686101 seconds (9.02 k allocations: 3.772 MiB, 0.10% gc time)
julia> @time read(str, 5s);
4.860484 seconds (9.03 k allocations: 3.772 MiB)
But if I've recently done a read, the read takes the full 5 seconds:
julia> read(str, 1s); @time read(str, 5s);
5.083124 seconds (9.04 k allocations: 3.772 MiB)
Can you give a little more guidance for reproducing the CPU usage issue you're seeing? In general there shouldn't be anything in PortAudio.jl that cares whether there's noise or silence coming into the stream, so the CPU usage shouldn't change. The only thing I could think about is some kind of denormal issue (sometimes doing math on very very small floating point values is slower than normal floats), but I think I need more info on what you're seeing to be able to debug it.
Also, if you're around MIT we could meet up in person if it's easier to reproduce on your machine.
just created issue #15 to track with I think is one of your issues. I'll keep this one open until we figure out whether this CPU thing is a PortAudio issue or not.
Hi! I'll take a look into it again with this new info. I am around MIT, but I'm pretty hosed until at least Thursday this week, so that's the timescale I'm working on. Thanks for the help!
@ssfrr I did some more digging and split my processing thread and sampling thread. So now all thread 1 does is IO (audio, CLI, UDP) and thread 2 does all of the touchy mathy processing. Same problem still shows up on thread 1, but I'll continue to try isolating the problem just in case it's one of those other IO's problem. Thanks!
Hi @JMurph2015 the package no longer uses RingBuffers and so #15 should no longer be an issue. I do however think it would be a great idea to add tests to compare requested read time to elapsed time similar to your tests above, so I'm changing the issue title. Let me know if this is would be sufficient.
Hi @bramtayl, sorry for the super late reply, I don't get to GitHub much these days. Yes I think that is a good idea, just as a sanity check, and shouldn't be terribly hard to implement (I think you can mine the data out of Julia's built in timing tools).
I still observe that the requested read and write times do not necessarily match with what actually occurs. This original strange behaviour is still present and I think will continue to confuse users. If its expected behaviour as @bramtayl implies (by suggesting to add tests rather than address), then I think that, in addition to tests, it should be clearly described in the documentation.
From my testing:
I am not sure how generalisable these findings are. And I suggest that the user needs to benchmark their own system for accuracy
Accurate timing of small writes (typically around 4s ms, but up to 10s of ms) is required for real time audio processing (my research field), which is not currently possible (if I am using the package correctly, hence I included my script below). Currently I am only using this software when 200 ms continuous writes are appropriate.
@bramtayl and @ssfrr know the underlying code best, I have no intuition if improving accuracy in the absence of recent read/writes has happened is possible, or if accurate timing can occur at faster frame rates.
Wow, that's some nice looking timer output! Sorry, I didn't mean it's not a problem, just that we should check since some of the suggestions are out of date. It might be nice to see the absolute difference between the actual time and the expected time. I think exact timing is probably a better question for the C library people, but by my estimation, I'd guess the difference shouldn't be too much more than compile time + latency + a few buffers. So, it the buffers are 120 frames and the sample rate is 44100 frames per second, that's 2 ms
Also, I'm not sure I really understand this:
Accurate timing of small writes (typically around 4s ms, but up to 10s of ms) is required for real time audio processing
Do you mean the time that sound is coming out of the speakers for, or the time that Julia is running the command for?
just that we should check since some of the suggestions are out of date.
Agreed, there was no accusation from me. Appreciate the great work you are doing keeping this package up to date.
Do you mean the time that sound is coming out of the speakers for, or the time that Julia is running the command for?
I think I was mixing things up here. So I've gone and done a few more checks, please correct any further misunderstandings on my behalf. In my test above I was assuming that if the @time
command in Julia was reporting 150 ms, then the sound was coming out of the speaker for 150 ms. The point below summarise my current understanding of the timing situation (is this an accurate summary as you understand too?):
The implication for users (and I understood was best practice anyway) would be:
sleep
So is it possible to improve the timing reported by Julia (not saying do it now, just is it likely to be feasible)? And/or should we inform users in the readme or (nice new) docs that the timing reported by the functions do not equate to the timing of the signal that comes from the speaker? I think the current behaviour is confusing to users as its intermittent, as sometimes the reported timing is accurate and sometimes it isn't.
Quickly reading through, this isn't the issue at hand either. The tests I am proposing are ones that validate requested_audio_time <= runtime_of_read
and returned_audio_length == requested_audio_time
. Basically, when I call read(device, 5s)
I expect that function to a) take at least 5 seconds and b) return 5 seconds of audio to me even, and especially, if the signal is zeros. The sound card doesn't cease to exist because there is nothing audible playing. At least in theory (though strictly speaking in implementation this may not be true), the sound card is always playing something, but that something may be zeros, which would mean no sound.
Ops, somehow i got switched from reads to writes haha. Sorry to hijack @JMurph2015
But my comment and tests from yesterday confirm that the issue with read timing that @JMurph2015 reported in 2017 still exists.
Ok, so I've changed the issue title to reference both julia runtime and the size of the returned buffer, because those are different things.
is it possible to improve the timing reported by Julia
I think it depends on the reason for the difference. At least some of it would depend on exactly how the C library is passing buffers back and forth to the sound card, and I don't know the answer here.
the issue with read timing that @JMurph2015 reported in 2017 still exists.
On my end, at least the size of the buffer issue seems fixed.
julia> PortAudioStream() do stream
@time read(stream, round(Int, 5 * stream.sample_rate))
end
5.003865 seconds (541.21 k allocations: 33.148 MiB, 0.21% gc time, 1.94% compilation time)
220500-frame, 2-channel SampleBuf{Float32, 2}
5.0s sampled at 44100.0Hz
▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
@JMurph2015 said:
Basically, when I call read(device, 5s) I expect that function to a) take at least 5 seconds and b) return 5 seconds of audio to me
(b) should be true, but (a) is not. The reasons are complicated.
Trying to nail down a definition of "now" that is well-defined and not surprising turns out to be very difficult when milliseconds count. There are basically two clocks running that are relevant:
read
or write
that detail doesn't really matter).@timeit
.You can ask the CPU what time it is, but that doesn't tell you very much about what's coming out of the speakers right now. It could be the data that your code put into the buffer some time ago.
So the question is what should happen when you call read(str, 48000)
? One option is:
The wall-clock time elapsed during this read
call would be pretty close to the requested time, and the beginning of the data would correspond to the moment that your code ran. This seems good, but what if you were to call buf1 = read(str, fs); buf2 = read(str, fs)
?. You would generally want those two buffers to represent contiguous audio data, but that's not what you'll get from the sequence above. The samples that the ADC recorded between the end of the first call and the beginning of the second would be discarded.
So we don't generally want to wipe out the input buffer between read
calls. In this case you want the 2nd read
call to grab some samples that were actually recorded before it was called. So what that means is that if you run buf1 = read(str, fs); sleep(0.1); buf2 = read(str, fs)
, each call to read
should give you 1 second of audio, but the second call actually executed in 0.9s, because immediately when it was called it grabbed the 0.1s of data that had accumulated during the sleep
. If the requested amount is less than what's in the buffer it could return immediately.
So what happens if you wait a long time before calling read
? We can't have that input buffer just growing without bound so you'd either want to have a ringbuffer that keeps some amount of the most recent audio, or you stop filling the buffer when it's full, and wipe it out the next time read
is called. I'm actually not sure what the current behavior is.
There's a dual for writing to the output buffer - if the amount you write is less than the free space in the buffer, the write could return immediately. @rob-luke I think this explains your results - for longer writes you fill the buffer and then wait around until the last of the data is in the buffer. So if the buffer was full when you started, you'd expect the runtime to be requested_time - block_size
. For very short writes you can actually write multiple times before filling the buffer, so the average runtime is very short indeed.
There are a few different ways to think about interfacing with a sound card. The first two are both pretty typical of low-level audio APIs, and libportaudio supports both of them. It's important to note that they both assume that it's the user's job to keep the audio data flowing - if they don't read or write often enough then buffers overflow or underflow, which is considered an error condition.
The soundcard calls a user-supplied function for every block of audio. This callback is often called from a very high-priority context/thread that's totally separate from the main user code. The callback's job is to take the most recent block of recorded audio, process it, and supply a new block of audio to played from the speaker. Usually the callback just gets pointers to the buffers and they are mutated in-place. Most very-low-latency audio APIs use this model. PortAudio.jl used to, but it's not a very good fit for high-level languages like Julia where the GC or JIT compiler could cause pauses at any time, and at the time Julia did not support being called from other threads (not sure whether that's still true).
The user reads and writes to buffers that are streaming to and from the sound card. If there is more data given[requested] than there is space[data] available, the user code blocks. This is how PortAudio.jl works, and described above.
This would be more typical of a higher-level package, for instance a computer music system like Supercollider. The idea is that there's some underlying task that's keeping the audio device fed, and you can sporadically give it audio data to play. You could have multiple tasks that give audio to play simultaneously and it will be mixed together. This seems pretty convenient, but it has a few issues.
The common solution to the second issue is that when you want to play something, you also supply a timestamp for when you want it to be played. Then the lower-level audio task takes care of mixing in your audio at the right time. You accept some latency (audio needs to be scheduled for some time in the future) but you eliminate the jitter and unpredictability.
@samarron has thought very deeply about time, and his SonicPi project has some nice ideas that are described in in this paper: https://www.cs.kent.ac.uk/people/staff/dao7/publ/farm14-sonicpi.pdf
I suspect something with similar temporal semantics could be implemented in Julia, but it's out of scope for PortAudio.jl.
It's difficult to correlate timing of your julia code to timing of your audio more precisely than 10s of milliseconds best case, and likely more on the order of 100-200ms.
If you care about exact timing between audio events, create an audio buffer with the right spacing and feed it into the audio device as a contiguous chunk (though it could be split over multiple write
calls). Then you are not subject to any scheduling issues due to Julia or the audio device buffer management. If you want something that can react to external stimuli (like a serial port message) with single-digit-ms precision, I suspect that's going to be very difficult to do with Julia and PortAudio.jl.
Ok, my general response is, yes you have a point with the intricacies of exactly how correlated you can make the system clock be with the audio clock, but no you miss the forest for the trees here.
In the context of PortAudio.jl, my program was just looping reading 1/30 of a second of audio. When there was something playing through the speakers, this worked reasonably consistently (though not entirely). When there was nothing playing, it totally bugged out and the blocking call would take dramatically less time than expected and return less data than expected, thereby spinning the CPU much more than necessary.
This is the problem that needs solving and is clearly inconsistent with the rest of the API. And it should be testable by ensuring the calls block for a minimum amount of time and when it does return, that it comes back with an appropriate amount of data. On that first point, I could understand the first call or two blocking for an unexpectedly short time, but in my case it was 10's of seconds worth of virtually non-blocking calls.
Ah yes, agreed that seems like a bug. I was mostly here responding to the expectation that a read/write of 5s should take 5s to execute, which I think is mostly what @rob-luke was talking about.
You mentioned:
it should be testable by ensuring the calls block for a minimum amount of time and when it does return, that it comes back with an appropriate amount of data
I agree with the 2nd point, that they should return the right amount of data, but I don't understand the first point. If there's 256ms of data in the read buffer and you're reading in 33ms chunks, the first 7 reads should not need to block at all. The 8th read should block for about 11ms, and then the buffer is empty so subsequent reads should block for about 33ms each.
edit: it will only be 33ms on average - the buffer will be filled in chunks that are probably larger than 33ms, so actual runtime of the read
will be bursty as the reads catch up with each new chunk of data.
@bramtayl said:
On my end, at least the size of the buffer issue seems fixed.
Have you confirmed that the CPU usage issue still exists in recent PortAudio.jl?
Nope, sorry, not even sure if I would know how to test that
Thanks for the wonderful summary @ssfrr. I appreciate you taking the time to step through the details. Your comments have cleared up my understanding and align with the behaviour I see.
So AFAICT the original bug was that read
wasn't returning the full requested amount of data, so @JMurph2015's code was spinning and burning up CPU. It seems that bug is resolved.
The issue of how much wall-clock time read
and write
should take is not a bug, but it does seem like the behavior is surprising to folks, so perhaps we just need to improve the docs of those functions to make things more clear?
So this was inspired by a problem in an LED effects program I'm working on in Julia. Basically, the largest bug with that program is that its CPU usage spikes to ~4-6x normal during and after silent times from the audio card. Since I've already checked 80+% of my code for type stability, I decided to look into what was coming out of PortAudio during times of intermittent silence. Let's just say things got stranger.
As you can see below, PortAudio.jl is not sampling for nearly as long as it should be in a lot of circumstances. I'm not sure why this is happening, but I could see a scenario where this would cause the havoc I'm seeing further up the stack. This is off Ubuntu 17.04 running on the default PortAudio device (pretty sure that ends up being pulse).
Any ideas about why PortAudio might be doing this would be much appreciated (or other thoughts on escalating CPU usage!).