bastibe / SoundCard

A Pure-Python Real-Time Audio Library
https://soundcard.readthedocs.io
BSD 3-Clause "New" or "Revised" License
680 stars 69 forks source link

Audio from bluetooth device is delayed by around 3 seconds #85

Open matanui159 opened 4 years ago

matanui159 commented 4 years ago

System: image

Bug: Audio from bluetooth device is delayed by around 3 seconds. See https://github.com/rbn42/panon/issues/26.

Reproducible script:

import soundcard as sc
import numpy as np
sample_rate=44100
channel_count=2
fps=60

blocksize = sample_rate // fps
mics = sc.all_microphones(exclude_monitors=False)

streams = []

for mic in mics:
    stream = mic.recorder(
        sample_rate,
        channel_count,
        blocksize,
    )
    stream.__enter__()
    streams.append(stream)

for i,mic in enumerate(mics):
    print('Stream %d id:%s'%(i,mic.id))
    print('Stream %d name:%s'%(i,mic.name))

while True:
    msg=''
    for i,stream in enumerate(streams):
        data = stream.record(blocksize)
        if np.sum(data)==0:
            msg+='Stream %d:paused '%i
        else:
            msg+='Stream %d:playing '%i
    print(msg,end='\r')

Requires NumPy. Thanks to @rbn42 for creating the script.

bastibe commented 4 years ago

You seem to be opening recorders from all sound cards simultaneously. There will be a lot of thrashing and stalling between the sound cards, and delays will inexorably rise.

Try recording from one sound card at a time, and see if the issue persists.

Regardless, however, this is likely a configuration issue in pulse, not SoundCard.

rbn42 commented 4 years ago

Try recording from one sound card at a time, and see if the issue persists.

I think it does not persist, because he told me,

Also this does not happen when specifically selecting my bluetooth headset in the Panon configuration, only when mixing all the streams.

rbn42 commented 4 years ago

You seem to be opening recorders from all sound cards simultaneously. There will be a lot of thrashing and stalling between the sound cards, and delays will inexorably rise.

Hi bastibe, do you mean it is not supposed to open multiple recorders simultaneously? Which means it is not possible to mix them in real-time? If so, I will mark this kind of issues wontfix in future.

bastibe commented 4 years ago

It should certainly be possible to open multiple recorders simultaneously. But real time waits for nobody. If even one of the recorders takes more than its fraction of a blocklength, all other recorders will be equally delayed. record with a blocksize always waits until blocksize samples have been recorded. In a multi-recorder scenario, you can not afford to wait.

To get the lowest possible latency, open the recorder with a very short blocksize (this sets pulse's internal block length to as short a latency as possible), then record without a blocksize. record without a blocksize does not incur a delay, but you get whatever data is available at the moment.

rbn42 commented 4 years ago

To get the lowest possible latency, open the recorder with a very short blocksize

Can I use concurrent.futures.ThreadPoolExecutor instead of a very short blocksize?

But I think the blocksize is not a problem here. Because matanui159 said only audio from bluetooth device was delayed, which means when matanui159 paused a song, different data was recorded by this line.

        data = stream.record(blocksize)

Assuming Stream 0 is a normal device, and Stream 1 is a bluetooth device. matanui159 must recorded all "0" data from Stream 0, and non-"0" data from Stream 1 for around 3 seconds.

Am I right? @matanui159 I mean, can you post a screenshot of the output of the script during that 3 seconds?

rbn42 commented 4 years ago

Regardless, however, this is likely a configuration issue in pulse, not SoundCard.

BTW, instinct tells me you may be right.

bastibe commented 4 years ago

Can I use concurrent.futures.ThreadPoolExecutor instead of a very short blocksize?

Probably, yes. I think CFFI releases the GIL while calling C functions, so that should be ok. (Don't use async though. Async certainly does not work.)

But I think the blocksize is not a problem here.

So far, I had assumed that the latency was incurred in the record step. But you are right, this doesn't make sense. If one record were to take three seconds, it would delay all other records as well. So SoundCard is probably pulling the data off the sound card as fast as it is produced (with the occasional missed block as we discussed), but the Bluetooth device on pulse somehow has a built-in delay of three seconds before it is even passed to SoundCard.

rbn42 commented 4 years ago

@matanui159

"""
I hope this script can help us clarify whether record() method is delayed or not 
for your bluetooth device. Could you please upload the screenshot of the output
during that 3 seconds?
"""
import soundcard as sc
import time
import numpy as np
sample_rate=44100
channel_count=2
fps=60

blocksize = sample_rate // fps
mics = sc.all_microphones(exclude_monitors=False)

streams = []

for mic in mics:
    stream = mic.recorder(
        sample_rate,
        channel_count,
        blocksize,
    )
    stream.__enter__()
    streams.append(stream)

for i,mic in enumerate(mics):
    print('Stream %d id:%s'%(i,mic.id))
    print('Stream %d name:%s'%(i,mic.name))
print('Expected time to record:',int(1000/fps),'ms')
while True:
    msg=''
    t_all=0
    for i,stream in enumerate(streams):
        t1=time.time()
        data = stream.record(blocksize)
        t=time.time()-t1
        t_all+=t
        msg+='Stream %d:'%i
        if np.sum(data)==0:
            msg+='paused '
        else:
            msg+='playing '
        msg+=str(int(t*1000))+'ms '
    msg=('Time:%dms '%(t_all*1000))+msg
    print(msg,end='\r')
matanui159 commented 4 years ago

Sorry for the delay, mix of busyness and sleep. image The times stay about the same constantly but the headset is still delayed by 3s. Playing the audio through other channels does not delay the audio and (when using Panon) selecting only the headset also does not delay the audio.

bastibe commented 4 years ago

This shows rather clearly that SoundCard is indeed pulling data off the sound card as fast as it produced. If enough data is available at record time, the delay is 0 ms, if it has to be recorded first, it is 17 ms.

But it seems the 3 s delay is happening somewhere inside pulseaudio, and not on the SoundCard side.

matanui159 commented 4 years ago

Any theories as to why it only happens when reading from multiple sources? Also anyway I might start looking to figure out why my setup is causing this issue?

bastibe commented 4 years ago

No, sorry. I would love to help, but I have very little experience with pulseaudio outside of SoundCard.

bastibe commented 4 years ago

You could try running the same code and audio hardware on Windows or macOS, and check if the issue persists. If so, the problem could be in SoundCard after all.

szlop commented 3 years ago

I stumbled across a similar problem when reading from the monitor of a null-sink and writing the processed signal to a device sink. The buffer of the input stream would immediately increase to 1 second. I changed some lines in soundcard to set a hard limit to the input buffer size (to 2 blocksize), which forces pulseaudio do drop samples instead of increasing the buffer once 2 blocksize is reached.

You can try commit 860f41c0d40e60045396ee0cd8d20481092f521b to see if it helps with your problem.

After reading some of the Pulseaudio documentation and soundcard code, my understanding is, that "blocksize" is not the same thing in the Pulseaudio context as in the other backends. It seems to me for Pulseaudio it is more like a friendly suggestion than a hard setting. This might be the reason for some of the confusion concerning blocksize.

bastibe commented 3 years ago

After reading some of the Pulseaudio documentation and soundcard code, my understanding is, that "blocksize" is not the same thing in the Pulseaudio context as in the other backends. It seems to me for Pulseaudio it is more like a friendly suggestion than a hard setting. This might be the reason for some of the confusion concerning blocksize.

Yes, from what I understand, this seems about accurate. Although personally, I have not seen Pulseaudio rack up block sizes unless my code was indeed using too much processing time.

szlop commented 3 years ago

After reading some of the Pulseaudio documentation and soundcard code, my understanding is, that "blocksize" is not the same thing in the Pulseaudio context as in the other backends. It seems to me for Pulseaudio it is more like a friendly suggestion than a hard setting. This might be the reason for some of the confusion concerning blocksize.

Yes, from what I understand, this seems about accurate. Although personally, I have not seen Pulseaudio rack up block sizes unless my code was indeed using too much processing time.

In my case it is probably the numba JIT compilation in the first processing loop, which causes the buffer size increase. The problem is, that Pulseaudio never returns to a smaller latency. Setting a hard limit to the buffer size solves the problem for me. It would be great however to be able to monitor buffer underruns.

bastibe commented 3 years ago

That's an interesting observation! Can you precompile the loop by specifying the expected data types ahead of time?

szlop commented 3 years ago

I can try and check this later.

szlop commented 3 years ago

I changed my code to do one "dry" run of the processing loop to trigger the JIT compilation, before the actual processing starts. This successfully prevents the latency increase.

I'd still plead for a configurable hard limit to the buffer size. I can think of applications that could rather tolerate buffer underruns (e.g. when interacting with a single threaded GUI) than latency increase to several seconds.

bastibe commented 3 years ago

I changed my code to do one "dry" run of the processing loop to trigger the JIT compilation, before the actual processing starts. This successfully prevents the latency increase.

I'm glad to hear that. So at least there is a workaround.

I'd still plead for a configurable hard limit to the buffer size. I can think of applications that could rather tolerate buffer underruns (e.g. when interacting with a single threaded GUI) than latency increase to several seconds.

That makes sense. If you'd like to add this feature as an optional keyword argument, I'd be grateful for a pull request.