faiface / beep

A little package that brings sound to any Go application. Suitable for playback and audio-processing.
MIT License
2.08k stars 151 forks source link

Help synchronizing audio #85

Open tiagolobocastro opened 4 years ago

tiagolobocastro commented 4 years ago

Hello,

First of all many thanks for both your beep and pixel projects which I use on my emulator - https://github.com/tiagolobocastro/gones :+1:

I've noticed that on linux (probably just timing) the audio on my emulator is a bit choppy. I've tried two things to improve it:

  1. reduce the buffer size by 500 can improve the audio but the first second sounds really strange... after that it's fine!
  2. adding a delay just before calling Play and that also works but seems more of a timing dependent patch rather than a fix.

I'm using a channel to synchronize between the stream callback and the emulator adding samples is what's causing the issue...

Also I've just noticed that by reducing the smaller buffer size as well as the size of the synchronization ("SampleChan") helps more than 1 as seems to reduce the initial strange sound though it causes a lot of samples to not be sent because the channel is not big enough...

Any tips to help me improve this would be greatly appreciated, thank you!

Here's, roughly how I do it if it helps understand what I said:

func init() {
    sampleRate = beep.SampleRate(44100)
    sampleChan = make(chan float64, s.sampleRate.N(time.Second/10))

    speaker.Init(sampleRate, sampleRate.N(time.Second/10))
    go func() {
            // the delay which improves the sound
            time.Sleep(time.Millisecond * 300)
            speaker.Play(s.stream())
        }()
}

func stream() beep.Streamer {
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        ln := len(samples)
        for i := 0; i < ln; i++ {
                        // wait until we have all samples from the emulator
            sample := <- sampleChan
            samples[i][0] = sample
            samples[i][1] = sample
        }
        return ln, true
    })
}

// Adding the sound samples
func Sample(sample float64) bool {
    select {
    case sampleChan <- sample:
        return true
    default:
                // sample chan is full, don't wait just try again next time
        return false
    }
}
MarkKremer commented 4 years ago

Hey Tiago,

I suspect this can give you some trouble:

sampleChan = make(chan float64, s.sampleRate.N(time.Second/10))

This creates a channel in which a single float64 is sent/read at a time. This could cause a lot of locking between goroutines. Instead, you should create a channel with type []float64 and send a whole slice at once. I would start by sending a slice of the same size the speaker uses (sampleRate.N(time.Second/10)) so it can be forwarded as a whole.

If that doesn't help I would check if the speaker has to wait for samples.

MarkKremer commented 4 years ago

I tested your code. What I said about the size of the slice is not correct. Speaker uses Mixer behind the scenes and it has an internal buffer of 512 samples. But it has a mechanism to deal with different buffer sizes so sometimes it requests less than 512 samples.

Here's a stand-alone version of your code with the choppy audio:

package main

import (
    "github.com/faiface/beep"
    "github.com/faiface/beep/speaker"
    "math"
    "time"
)

var sampleRate beep.SampleRate
var sampleChan chan float64

func init() {
    sampleRate = beep.SampleRate(44100)
    sampleChan = make(chan float64, sampleRate.N(time.Second/10))

    speaker.Init(sampleRate, sampleRate.N(time.Second/10))
    go func() {
        // the delay which improves the sound
        //time.Sleep(time.Millisecond * 300)
        speaker.Play(stream())
    }()
}

func stream() beep.Streamer {
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        ln := len(samples)
        for i := 0; i < ln; i++ {
            // wait until we have all samples from the emulator
            sample := <- sampleChan
            samples[i][0] = sample
            samples[i][1] = sample
        }
        return ln, true
    })
}

// Adding the sound samples
func Sample(sample float64) bool {
    select {
    case sampleChan <- sample:
        return true
    default:
        // sample chan is full, don't wait just try again next time
        return false
    }
}

// Tone generates a tone. I copied this from my comment on another issue. :)
func Tone(sampleRate beep.SampleRate, freq float64) beep.Streamer {
    var playbackPos int
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        for i := range samples {
            amp := math.Sin(2.0 * math.Pi * freq / float64(sampleRate.N(time.Second)) * float64(playbackPos))
            samples[i][0] = amp
            samples[i][1] = amp
            playbackPos++
        }
        return len(samples), true
    })
}

func main() {
    tone := Tone(sampleRate, 261.6) // Sine wave equivalent of middle C key on keyboard

    for {
        // Get a buffer with a sine wave.
        samples := make([][2]float64, 512)
        tone.Stream(samples)

        // Send it to the speaker, one sample at a time
        for i := range samples {
            Sample(samples[i][0])
        }
    }
}

Now with some fixes:

package main

import (
    "github.com/faiface/beep"
    "github.com/faiface/beep/speaker"
    "math"
    "time"
)

var sampleRate beep.SampleRate
var sampleChan chan []float64

func init() {
    sampleRate = beep.SampleRate(44100)
    sampleChan = make(chan []float64)

    speaker.Init(sampleRate, sampleRate.N(time.Second/10))
    go func() {
        // the delay which improves the sound
        //time.Sleep(time.Millisecond * 300)
        speaker.Play(stream())
    }()
}

func stream() beep.Streamer {
    var input []float64
    var i int
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        if len(input) == 0 {
            input = <- sampleChan
            i = 0
        }

        // If there are unread samples in input, copy them to samples.
        // If not, get the next input.
        // Repeat until samples is filled.
        j := 0
        for {
            for j < len(samples) && i < len(input) {
                samples[j][0] = input[i]
                samples[j][1] = input[i]
                i++
                j++
            }
            if j >= len(samples) {
                return j, true
            }
            input = <- sampleChan
            i = 0
        }
    })
}

// Adding the sound samples
func Sample(samples []float64) bool {
    sampleChan <- samples
    return true
}

// Tone generates a tone. I copied this from my comment on another issue. :)
func Tone(sampleRate beep.SampleRate, freq float64) beep.Streamer {
    var playbackPos int
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        for i := range samples {
            amp := math.Sin(2.0 * math.Pi * freq / float64(sampleRate.N(time.Second)) * float64(playbackPos))
            samples[i][0] = amp
            samples[i][1] = amp
            playbackPos++
        }
        return len(samples), true
    })
}

func main() {
    tone := Tone(sampleRate, 261.6) // Play middle C

    for {
        // Get a buffer with a sine wave.
        samples := make([][2]float64, 512)
        tone.Stream(samples)

        // Copy it to the format we want
        output := make([]float64, len(samples))
        for i := range samples {
            output[i] = samples[i][0]
        }

        // Send it to the speaker
        Sample(output)
    }
}

I changed stream so it can deal with float slices. It takes care of different buffer sizes. I also changed Sample so it blocks when the speaker channel is full. My main function doesn't try to sent the same sample at a later time if it can't send it. If your emulator does do that then that's fine and you can use your own Sample implementation. Just make sure no samples are ever dropped as this will create weird artifacts in the sound.

tiagolobocastro commented 4 years ago

Awesome thank you so much Mark! :+1: I'll test and study your code now

tiagolobocastro commented 4 years ago

Yeah that's a good suggestion thanks, no point on locking the chan every loop iteration of the sample callback. I've roughly implemented it on my Sample code, which fills a []float64 and tickles the chan when it's filled with sampleRate.N(time.Second/10) elements.

Sadly the sound is still chopping :(+

I've also verified that sample callback indeed waits for the samples, but only in the initial samples which does explain why the hacky delay seems to help. Maybe I should only call Play when the first slice arrives?

MarkKremer commented 4 years ago

Could you create a runnable example of what you're doing now? Does my last snippet still cause chopping for you?

Is it possible to play to the speaker directly instead of through a channel? Then there's one less bottleneck. Speaker is a Mixer so it can play multiple streams at once if you need it to.

Generally, you should make sure enough samples reach the speaker when/before it needs it. Sooner is better than later because it needs to hand it over to the underlying library, which hands it over to the driver, which hands it over to the hardware... Or something. Lots of copying.

tiagolobocastro commented 4 years ago

I'll try to create a simple repro.

Your snippet itself doesn't chop but it does have a weird sound at the very beginning which is removable by commenting out the delay line and setting it to 400ms.

I don't think I can skip the channel because it's the UI that runs emulator cycles (simply because I did that bit first :-)). I could potentially run emulator cycles from the audio callback though it would be a big change.

tiagolobocastro commented 4 years ago

Also I just realized your first example was choppy because of this:

func Sample(sample float64) bool {
    select {
    case sampleChan <- sample:
        return true
    default:
        // sample chan is full, don't wait just try again next time
        return false
    }
}

It keeps hitting the default case whilst moving the sin along. My emulator calls Sample at a frequency of approx 44k100 so the default case is rarely hit and only there to tell me if the audio is falling behind.

This just made me realize, that's why the delay works for me! Since it's pushing at that frequency it will take a long time to fill up the buffer of sampleRate.N(time.Second/10) which is already being "awaited" at the stream callback. However it doesn't explain why the choppiness persists, unless the emulator is then always playing catch with the stream callback?

tiagolobocastro commented 4 years ago

This is my attempt at simulating my emulator. It doesn't behave exactly the same though, as it fixes itself after about 2-3 seconds somehow :/ Also just before I hit comment I ran my emulator again and now it shows similar behaviour, works choppy at first and then fixes itself, did a reboot alleviate it? :(

package main

import (
    "github.com/faiface/beep"
    "github.com/faiface/beep/speaker"
    "math"
    "time"
)

var sampleRate beep.SampleRate
var sampleChan chan float64
func Init() {
    sampleRate = beep.SampleRate(44100)
    sampleChan = make(chan float64, sampleRate.N(time.Second/10))

    speaker.Init(sampleRate, sampleRate.N(time.Second/10))
    go func() {
        // the delay which improves the sound
        //time.Sleep(time.Millisecond * 300)
        speaker.Play(stream())
    }()
}

func stream() beep.Streamer {
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        ln := len(samples)
        for i := 0; i < ln; i++ {
            // wait until we have all samples from the emulator
            sample := <- sampleChan
            samples[i][0] = sample
            samples[i][1] = sample
        }
        return ln, true
    })
}

// Adding the sound samples
func Sample(sample float64) bool {
    select {
    case sampleChan <- sample:
        return true
    default:
                panic("should not happen!")
        // sample chan is full, don't wait just try again next time
        return false
    }
    //sampleChan <- sample
    return true
}

// Tone generates a tone. I copied this from my comment on another issue. :)
func Tone(sampleRate beep.SampleRate, freq float64) beep.Streamer {
    var playbackPos int
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        for i := range samples {
            amp := math.Sin(2.0 * math.Pi * freq / float64(sampleRate.N(time.Second)) * float64(playbackPos))
            samples[i][0] = amp
            samples[i][1] = amp
            playbackPos++
        }
        return len(samples), true
    })
}

func main() {
    Init()
    tone := Tone(sampleRate, 1000)

        // emulator clock runs at Frequency but we want to sample at sampleRate
    sampleTicks = float64(Frequency) / float64(sampleRate)
    sampleTargetTicks = sampleTicks
    clock = 0

    tmr := time.Tick(time.Second / 240)
    for {
        // run steps of emulator time and wait for the real time to elapse
        Step((time.Second / 240).Seconds(), tone)
        <-tmr
    }
}

const Frequency = 1789773
var sampleTicks float64
var sampleTargetTicks float64
var clock uint
func Step(seconds float64, tone beep.Streamer) {
    cyclesPerSecond := float64(Frequency)
    cyclesPerSecond *= seconds
    runCycles := int(cyclesPerSecond)

    for runCycles > 0 {
                // 
        if clock >= uint(sampleTargetTicks) {
            sampleTargetTicks += sampleTicks

            samples := make([][2]float64, 1)
            tone.Stream(samples)
            Sample(samples[0][0])
        }
        clock++
        runCycles--
    }
}
tiagolobocastro commented 4 years ago

So I made a discovery... today my emulator was sounding ok because I was using my bluetooth headphones... I just tried my wired headphones and the choppiness is back... no clue why that matters though... And then yes, my code above is a "perfect" simulation of the issue it seems...

EDIT: My latest theory is that because the emulator runs every 1/240 of a second and then sleeps, it cannot fill the buffer of sampleRate.N(time.Second/10)... if I use sampleRate.N(time.Second/183) it does sound ok but with a bang when I turn it on and sampleRate.N(time.Second/300) seems better but I'll have to test it further...

MarkKremer commented 4 years ago

I'll try to have a look at it tomorrow. Message me if I forgot.

tiagolobocastro commented 4 years ago

thanks Mark, no rush though, whenever you can! I'm still geared toward my last comment... I've replaced the channel with a circular buffer which discards the tails when it gets full though that didn't help much..

However reducing the size of the stream buffer or increase the size of buffer between the emulator and the apu and having the stream callback return when there are no more samples rather than sleep there, eg:

func (s *SpeakerBeep) stream() beep.Streamer {
    return beep.StreamerFunc(func(samples [][2]float64) (n int, ok bool) {
        read := s.buffer.ReadInto2(samples)
// read could be 0 if there are no samples to return...
        return read, true
    })
}

seems to help quite a lot though every other run I still get the crackling when I first start it... maybe when I first start it I need to run the emulator for enough cycles till I fill the sampling buffer and from then on I can do the loop { run,sleep } ?

MarkKremer commented 4 years ago

My latest theory is that because the emulator runs every 1/240 of a second and then sleeps, it cannot fill the buffer of sampleRate.N(time.Second/10)..

I got the same.

I introduced a print in the stream() func to let me know when it is waiting for samples:

var sample float64
loop:
for {
    select {
    case sample = <-sampleChan:
        break loop
    default:
        go fmt.Print("#") // For testing, don't wait for it to finish.
    }
}

I've replaced the channel with a circular buffer which discards the tails when it gets full though that didn't help much..

Never discard samples. This will create a discontinuation in the sound and this will cause crackling. See my beautiful drawing of a sine wave with some missing samples in the middle: image


I don't know if this is still the case with your circular buffer. One problem in your snippet with the channel implementation is that if the emulator ever gets behind on producing samples for the speaker, it will have no way to catch up to it again. It produces the exact number of samples to fill the buffer, but too late, so the next time it will be behind again.

if clock >= uint(sampleTargetTicks) {

You should make sure that either the emulator can never get behind (unlikely given random-ish goroutine/thread schedules/timing functions if you don't also have the next part:) or that the emulator can catch up if needed. This last one is (partly?) done by:

However reducing the size of the stream buffer or increase the size of buffer between the emulator and the apu

So that's good. :+1: Make sure that you fill the entire buffer between the emulator and the stream func.

and having the stream callback return when there are no more samples rather than sleep there

Returning the number of samples is usually used to indicate that the end of the stream is nearly reached. The speaker will still send the unfilled part of the buffer as silence to the actual speaker. It may help a bit but I don't expect it to solve it completely. You'll get something like this: image


maybe when I first start it I need to run the emulator for enough cycles till I fill the sampling buffer and from then on I can do the loop { run,sleep } ?

Yes, or fill the buffer with silence. Same effect, probably a bit easier to implement.

tiagolobocastro commented 4 years ago

Never discard samples. This will create a discontinuation in the sound and this will cause crackling. See my beautiful drawing of a sine wave with some missing samples in the middle: image

Right, yeah that was a bad idea :D

I don't know if this is still the case with your circular buffer. One problem in your snippet with the channel implementation is that if the emulator ever gets behind on producing samples for the speaker, it will have no way to catch up to it again. It produces the exact number of samples to fill the buffer, but too late, so the next time it will be behind again.

This is certainly still possible I think, it will end up drifting.. but maybe not so bad as long it takes a long time to happen and since the buffer has a limited size it should not get too far off (eg if the emulator gets ahead, it can't get ahead more than the circ buffer buffer size, I hope!)

Returning the number of samples is usually used to indicate that the end of the stream is nearly reached. The speaker will still send the unfilled part of the buffer as silence to the actual speaker. It may help a bit but I don't expect it to solve it completely. You'll get something like this: image

Ahh well another one to the list of mistakes :D thanks!

maybe when I first start it I need to run the emulator for enough cycles till I fill the sampling buffer and from then on I can do the loop { run,sleep } ?

Yes, or fill the buffer with silence. Same effect, probably a bit easier to implement.

I think I tried that but it didn't help much. I suspect because the samples that were meant to that "slot" were then used on the next one and that ends up causing more drifts? Maybe when we fill N buffer samples with silent, we need to discard the next N samples? Though we may also be increasing the gap yet again...

tiagolobocastro commented 4 years ago

So, I tried to pre-run the emulator before I call speaker.Play and the initial crackling is fully gone... I need to be careful with how many samples I prepare though, because too many and then the audio lags behind the image :D

Adding the silence when there are not enough samples doesn't really work. Is it because the sample callback only calls us with eg 512 at a time? So we end up sending back N 512 batches, each with a mix of real samples and silence samples. If we got a single call with the full sample array we could, potentially, figure out how far behind we are and prefill the first elements of the buffer with silence, maybe?

EDIT: Ah but sending all silence if we can't fill the whole 512 seems almost as good as pre-filling the samples!

tiagolobocastro commented 4 years ago

The pre-sampling approach works most of the time but occasionally beep seems to try to grab more samples than it ought to!

So I call speaker.Init with a buffer size of 4410 which means the buffer has enough data for 0.1s. What happens is that ocasionally beep starts collecting more than 4410 samples after only 2.7ms since it started!

Apologies for the roughness but if you imagine the time axis a bit like this:

--S-------------F----------------------------------------------G

Where S is the the start time (when I call speaker.Play), F is the time of the first sample callback, and G is the time where the sample callback wants to collect past 4410 samples).

I would expect G to be around S + 0.1 but what I measured is actually: F = S + 274us G = S + 2.67ms

Any clues @MarkKremer ?

MarkKremer commented 4 years ago

Some clues:

// update pulls new data from the playing Streamers and sends it to the speaker. Blocks until the
// data is sent and started playing.
func update() {

Found here

// Write writes PCM samples to the Player.
//
// The format is as follows:
//   [data]      = [sample 1] [sample 2] [sample 3] ...
//   [sample *]  = [channel 1] ...
//   [channel *] = [byte 1] [byte 2] ...
// Byte ordering is little endian.
//
// The data is first put into the Player's buffer. Once the buffer is full, Player starts playing
// the data and empties the buffer.
//
// If the supplied data doesn't fit into the Player's buffer, Write block until a sufficient amount
// of data has been played (or at least started playing) and the remaining unplayed data fits into
// the buffer.
//
// Note, that the Player won't start playing anything until the buffer is full.
func (p *Player) Write(buf []byte) (int, error) {

Found here

tiagolobocastro commented 4 years ago

So looks like as soon as "sufficient" data has been played we get back into the update loop and try to get the next set of data. Then I'm at at loss on how to adjust my code since I can't keep pre filling samples as that adds a noticeable sound lag.

Maybe since I now pre fill the initial buffer it's ok to make the stream callback wait? Or reduce the buffer size tenfold and then the lag will be lessened.

MarkKremer commented 4 years ago

You can reduce the buffer size.

tiagolobocastro commented 4 years ago

Yeah using a buffer size of 0.01s and pre-filling the buffer with 1.5x the buffer size seems to work reasonably well 8 out of 10 times so I'll go with it. Thanks a lot for your help!

tiagolobocastro commented 4 years ago

This has been working perfectly on linux but I've booted into windows today and the crackling is back... Seems that, unlike linux, on windows smaller buffer size means the stream keeps falling behind the emulator for some reason... So I tried making the circular buffer huge and what happens is the sound is coming out with a bit of crackling but mostly at a very "slow pace", if that makes sense?

MarkKremer commented 4 years ago

Hmm, I don't know. I'm not using Windows at the moment so it's a bit difficult to figure out what's going on.

With slow pace, do you mean like if you set the speed on a YouTube video to 0.75% or something?

tiagolobocastro commented 4 years ago

Ah yes that's exactly it

tiagolobocastro commented 3 years ago

Hi again @MarkKremer, I had another look at this crackling issue on windows and I can make it happen with the "4-making-own-streamers\a" example just by reducing the buffer size to 1024. I took some collection data from the callbacks where I collect the len of the samples and how many ms have passed since the last collection. Here's the pattern for different buffer sizes:

4096: request: 512 time 95 request: 512 time 0 request: 512 time 0 request: 512 time 0 request: 512 time 0 request: 512 time 0 request: 512 time 0 request: 512 time 0 request: 512 time 96

2048: request: 512 time 47 request: 512 time 0 request: 512 time 0 request: 512 time 0 request: 512 time 43

1024: request: 512 time 51 request: 512 time 0 request: 512 time 9 request: 512 time 0 request: 512 time 51

So the 1k buffer size is different than the bigger ones, with two different samples collected with different intervals, but maybe that's ok.. however if you average it, it seems to be collecting 1k samples every 30ms rather than 23ms? though this could be measuring errors...

EDIT: Actually, same issue here without no modification: https://github.com/hajimehoshi/oto/blob/master/example/main.go It plays smoothly if I increase the buffer to 8192.