Open matanui159 opened 4 years ago
You seem to be opening recorders from all sound cards simultaneously. There will be a lot of thrashing and stalling between the sound cards, and delays will inexorably rise.
Try recording from one sound card at a time, and see if the issue persists.
Regardless, however, this is likely a configuration issue in pulse, not SoundCard.
Try recording from one sound card at a time, and see if the issue persists.
I think it does not persist, because he told me,
Also this does not happen when specifically selecting my bluetooth headset in the Panon configuration, only when mixing all the streams.
You seem to be opening recorders from all sound cards simultaneously. There will be a lot of thrashing and stalling between the sound cards, and delays will inexorably rise.
Hi bastibe, do you mean it is not supposed to open multiple recorders simultaneously? Which means it is not possible to mix them in real-time? If so, I will mark this kind of issues wontfix in future.
It should certainly be possible to open multiple recorders simultaneously. But real time waits for nobody. If even one of the recorders takes more than its fraction of a blocklength, all other recorders will be equally delayed. record
with a blocksize always waits until blocksize
samples have been recorded. In a multi-recorder scenario, you can not afford to wait.
To get the lowest possible latency, open the recorder with a very short blocksize (this sets pulse's internal block length to as short a latency as possible), then record
without a blocksize. record
without a blocksize does not incur a delay, but you get whatever data is available at the moment.
To get the lowest possible latency, open the recorder with a very short blocksize
Can I use concurrent.futures.ThreadPoolExecutor instead of a very short blocksize?
But I think the blocksize
is not a problem here. Because matanui159 said only audio from bluetooth device was delayed, which means when matanui159 paused a song, different data was recorded by this line.
data = stream.record(blocksize)
Assuming Stream 0 is a normal device, and Stream 1 is a bluetooth device. matanui159 must recorded all "0" data from Stream 0, and non-"0" data from Stream 1 for around 3 seconds.
Am I right? @matanui159 I mean, can you post a screenshot of the output of the script during that 3 seconds?
Regardless, however, this is likely a configuration issue in pulse, not SoundCard.
BTW, instinct tells me you may be right.
Can I use concurrent.futures.ThreadPoolExecutor instead of a very short blocksize?
Probably, yes. I think CFFI releases the GIL while calling C functions, so that should be ok. (Don't use async though. Async certainly does not work.)
But I think the
blocksize
is not a problem here.
So far, I had assumed that the latency was incurred in the record
step. But you are right, this doesn't make sense. If one record
were to take three seconds, it would delay all other record
s as well. So SoundCard is probably pulling the data off the sound card as fast as it is produced (with the occasional missed block as we discussed), but the Bluetooth device on pulse somehow has a built-in delay of three seconds before it is even passed to SoundCard.
@matanui159
"""
I hope this script can help us clarify whether record() method is delayed or not
for your bluetooth device. Could you please upload the screenshot of the output
during that 3 seconds?
"""
import soundcard as sc
import time
import numpy as np
sample_rate=44100
channel_count=2
fps=60
blocksize = sample_rate // fps
mics = sc.all_microphones(exclude_monitors=False)
streams = []
for mic in mics:
stream = mic.recorder(
sample_rate,
channel_count,
blocksize,
)
stream.__enter__()
streams.append(stream)
for i,mic in enumerate(mics):
print('Stream %d id:%s'%(i,mic.id))
print('Stream %d name:%s'%(i,mic.name))
print('Expected time to record:',int(1000/fps),'ms')
while True:
msg=''
t_all=0
for i,stream in enumerate(streams):
t1=time.time()
data = stream.record(blocksize)
t=time.time()-t1
t_all+=t
msg+='Stream %d:'%i
if np.sum(data)==0:
msg+='paused '
else:
msg+='playing '
msg+=str(int(t*1000))+'ms '
msg=('Time:%dms '%(t_all*1000))+msg
print(msg,end='\r')
Sorry for the delay, mix of busyness and sleep. The times stay about the same constantly but the headset is still delayed by 3s. Playing the audio through other channels does not delay the audio and (when using Panon) selecting only the headset also does not delay the audio.
This shows rather clearly that SoundCard is indeed pulling data off the sound card as fast as it produced. If enough data is available at record time, the delay is 0 ms, if it has to be recorded first, it is 17 ms.
But it seems the 3 s delay is happening somewhere inside pulseaudio, and not on the SoundCard side.
Any theories as to why it only happens when reading from multiple sources? Also anyway I might start looking to figure out why my setup is causing this issue?
No, sorry. I would love to help, but I have very little experience with pulseaudio outside of SoundCard.
You could try running the same code and audio hardware on Windows or macOS, and check if the issue persists. If so, the problem could be in SoundCard after all.
I stumbled across a similar problem when reading from the monitor of a null-sink and writing the processed signal to a device sink. The buffer of the input stream would immediately increase to 1 second. I changed some lines in soundcard to set a hard limit to the input buffer size (to 2 blocksize), which forces pulseaudio do drop samples instead of increasing the buffer once 2 blocksize is reached.
You can try commit 860f41c0d40e60045396ee0cd8d20481092f521b to see if it helps with your problem.
After reading some of the Pulseaudio documentation and soundcard code, my understanding is, that "blocksize" is not the same thing in the Pulseaudio context as in the other backends. It seems to me for Pulseaudio it is more like a friendly suggestion than a hard setting. This might be the reason for some of the confusion concerning blocksize.
After reading some of the Pulseaudio documentation and soundcard code, my understanding is, that "blocksize" is not the same thing in the Pulseaudio context as in the other backends. It seems to me for Pulseaudio it is more like a friendly suggestion than a hard setting. This might be the reason for some of the confusion concerning blocksize.
Yes, from what I understand, this seems about accurate. Although personally, I have not seen Pulseaudio rack up block sizes unless my code was indeed using too much processing time.
After reading some of the Pulseaudio documentation and soundcard code, my understanding is, that "blocksize" is not the same thing in the Pulseaudio context as in the other backends. It seems to me for Pulseaudio it is more like a friendly suggestion than a hard setting. This might be the reason for some of the confusion concerning blocksize.
Yes, from what I understand, this seems about accurate. Although personally, I have not seen Pulseaudio rack up block sizes unless my code was indeed using too much processing time.
In my case it is probably the numba JIT compilation in the first processing loop, which causes the buffer size increase. The problem is, that Pulseaudio never returns to a smaller latency. Setting a hard limit to the buffer size solves the problem for me. It would be great however to be able to monitor buffer underruns.
That's an interesting observation! Can you precompile the loop by specifying the expected data types ahead of time?
I can try and check this later.
I changed my code to do one "dry" run of the processing loop to trigger the JIT compilation, before the actual processing starts. This successfully prevents the latency increase.
I'd still plead for a configurable hard limit to the buffer size. I can think of applications that could rather tolerate buffer underruns (e.g. when interacting with a single threaded GUI) than latency increase to several seconds.
I changed my code to do one "dry" run of the processing loop to trigger the JIT compilation, before the actual processing starts. This successfully prevents the latency increase.
I'm glad to hear that. So at least there is a workaround.
I'd still plead for a configurable hard limit to the buffer size. I can think of applications that could rather tolerate buffer underruns (e.g. when interacting with a single threaded GUI) than latency increase to several seconds.
That makes sense. If you'd like to add this feature as an optional keyword argument, I'd be grateful for a pull request.
System:
Bug: Audio from bluetooth device is delayed by around 3 seconds. See https://github.com/rbn42/panon/issues/26.
Reproducible script:
Requires NumPy. Thanks to @rbn42 for creating the script.