faiface / beep

A little package that brings sound to any Go application. Suitable for playback and audio-processing.
MIT License
2.06k stars 153 forks source link

`wav.Decode(f)` resulting `format` has wrong sample rate #163

Closed mrryanjohnston closed 1 year ago

mrryanjohnston commented 1 year ago

Following the instructions here: https://github.com/faiface/beep/wiki/Hello,-Beep!

I cannot attach the wav file that I use to this GitHub issue, but I generate one using https://github.com/rhasspy/piper as such:

echo "Hello everybody at GitHub! Hopefully this demonstrates the issue well." | ./piper --output_file debug.wav --model en-us-ryan-low.onnx

I then have the following in my main.go file:

package main

import (
        "log"
        "os"
        "time"

        "github.com/faiface/beep"
        "github.com/faiface/beep/speaker"
        "github.com/faiface/beep/wav"
)

func main() {
        f, err := os.Open("debug.wav")
        if err != nil {
                log.Fatal(err)
        }
        streamer, format, err := wav.Decode(f)
        log.Println(format)
        if err != nil {
                log.Fatal(err)
        }
        defer streamer.Close()

        speaker.Init(format.SampleRate, format.SampleRate.N(time.Second/10))

        done := make(chan bool)
        speaker.Play(beep.Seq(streamer, beep.Callback(func() {
                done <- true
        })))

        <-done
}

When I run this with go run main.go, there's a brief garbled sound before the program exists. Changing the speaker.Init line as such plays a portion of the file, but not the whole thing:

speaker.Init(format.SampleRate, format.SampleRate.N(time.Second))

Using log.Println to print out format.SampleRate gives me {16000 1 2}. However, according to piper defaults, the wav file produced is 22050 Hz: https://github.com/rhasspy/piper/blob/96149e2856c7c90a7de5383b1d30617856da5d78/src/cpp/piper.cpp#L129

darless commented 1 year ago

I tried this with the following sound clip provided by piper (both mp3 and wav formats) and the format stated by beep looks correct.

https://rhasspy.github.io/piper-samples/

File used https://huggingface.co/rhasspy/piper-voices/blob/main/en/en_GB/alan/medium/samples/speaker_0.mp3

# Model card for alan (medium)

* Language: en_GB (English, Great Britain)
* Speakers: 1
* Quality: medium
* Samplerate: 22,050Hz

To test with WAV I used ffmpeg to convert from mp3 to WAV

ffmpeg -i speaker_0.mp3 -acodec pcm_u8 -ar 22050 song.wav

Environment (go.mod)

    github.com/faiface/beep v1.1.0

Code to test mp3

            f, err := os.Open("speaker_0.mp3")
            if err != nil {
                log.Fatal(err)
            }
            streamer, format, err := mp3.Decode(f)
            if err != nil {
                log.Fatal(err)
            }
            defer streamer.Close()
            log.Println(format)
            speaker.Init(format.SampleRate, format.SampleRate.N(time.Second/10))
            done := make(chan bool)
            speaker.Play(beep.Seq(streamer, beep.Callback(func() {
                done <- true
            })))
            <-done

Code to test wav

            f, err := os.Open("song.wav")
            if err != nil {
                log.Fatal(err)
            }
            streamer, format, err := wav.Decode(f)
            if err != nil {
                log.Fatal(err)
            }
            defer streamer.Close()
            log.Println(format)
            speaker.Init(format.SampleRate, format.SampleRate.N(time.Second/10))
            done := make(chan bool)
            speaker.Play(beep.Seq(streamer, beep.Callback(func() {
                done <- true
            })))
            <-done

Both the WAV and MP3 files were playable and the format

MP3:

2023/06/28 14:10:53 {22050 2 2}

WAV:

2023/06/28 14:11:21 {22050 1 1}

Couldn't test with piper directly, and that is outside the scope of this project. You can use mplayer or similiar audio tools to see the sample rate.

mrryanjohnston commented 1 year ago

Got it. Must be a problem on my end. Closing.