TooTallNate / node-speaker

Output PCM audio data to the speakers
655 stars 147 forks source link

Plays from local file, but errors when streaming same data over http #51

Open nfriedly opened 9 years ago

nfriedly commented 9 years ago

Hey, this is an awesome lib, but I'm getting an odd error when I try to play a stream over HTTP. This code:

var watson = require('watson-developer-cloud');
var Speaker = require('speaker');

var speaker = new Speaker({
    channels: 1,
    bitDepth: 16,
    sampleRate: 48000
});

var text_to_speech = watson.text_to_speech({
  username: '0eaef628-d28e-4365-b0db-069046f37fef',
  password: 'Mm1DWPHFC7sq',
  version: 'v1'
});

var params = {
  text: 'Hello from IBM Watson',
  accept: 'audio/wav' //ogg; codec=opus'
};

text_to_speech.synthesize(params).pipe(speaker).on('error', function(err) {
    console.error(err.stack);
});

fails with the error:

Error: write() failed: 1018
    at onwrite (/home/pi/nodeplayer/node_modules/speaker/index.js:269:12)

Do you know what that means? If I replace the end with a version that downloads the entire file before playing, it works but it takes a lot longer for any significant amount of text:

var fs = require('fs');
text_to_speech.synthesize(params)
    .pipe(fs.createWriteStream('out.wav'))
    .on('close', function() {
        fs.createReadStream('out.wav').pipe(speaker);
    });

Any ideas?

nfriedly commented 9 years ago

I re-ran both cases with debugging enabled:

$ DEBUG=speaker node player.js 
setting up
  speaker format(object keys = [ 'channels', 'bitDepth', 'sampleRate', 'lowWaterMark', 'highWaterMark' ]) +0ms
  speaker setting 'channels': 1 +130ms
  speaker setting 'bitDepth': 16 +33ms
  speaker setting 'sampleRate': 48000 +6ms
  speaker _pipe() +4s
  speaker format(object keys = [ 'domain', '_events', '_maxListeners', 'uri', 'callback', 'method', 'useQuerystring', 'headers', 'readable', 'writable', 'explicitMethod', '_auth', '_oauth', '_multipart', '_redirect', 'setHeader', 'hasHeader', 'getHeader', 'removeHeader', 'localAddress', 'qsLib', 'qsParseOptions', 'qsStringifyOptions', 'pool', 'dests', '__isRequestRequest', '_callback', 'proxy', 'tunnel', 'setHost', 'originalCookieHeader', '_disableCookies', '_jar', 'port', 'host', 'url', 'path', 'httpModule', 'agentClass', 'agent' ]) +4ms
  speaker _write() (1019 bytes) +1s
  speaker open() +7ms
  speaker setting default 'signed': true +2ms
  speaker writing 1019 byte chunk +25ms
  speaker wrote 1018 bytes +71ms
Error: write() failed: 1018
    at onwrite (/home/pi/nodeplayer/node_modules/speaker/index.js:269:12)

It looks like it's just loosing a single byte, any ideas how that could happen?

For comparison, I also ran the happy case where I write to a file and then play back from the file:

[ ridiculously long log moved to http://pastebin.com/v4F0dpAD ]

nfriedly commented 9 years ago

I disabled the check on (with && false) just to see what would happen - I put the full log on http://pastebin.com/uCGDuFKf but the short version is that any time < 1024 bytes are written, it seems to loose a byte.

The audio played all the way through, but it had a lot of noise and screeching.

TooTallNate commented 9 years ago

Your original script actually plays for me correctly (you probably shouldn't be handing out your username/password like that btw…), on OS X and node v0.12.2. What's your setup?

TooTallNate commented 9 years ago

You'd also probably want to parse out the WAV header before piping to node-speaker, to avoid that pop in the beginning. See: https://github.com/TooTallNate/node-wav

nfriedly commented 9 years ago

Oh, right I should have mentioned that I'm trying to play it on a Raspberry Pi (an old 256 mb model B to be specific). I didn't think to try until just now but it does work on my macbook as well (with the pop at the start)

I tried to put together a node-wav version:

var fs = require('fs');
var watson = require('watson-developer-cloud');
var Speaker = require('speaker');
var wav = require('wav');

var reader = new wav.Reader();;

var text_to_speech = watson.text_to_speech({
  username: '0eaef628-d28e-4365-b0db-069046f37fef',
  password: 'Mm1DWPHFC7sq',
  version: 'v1'
});

var params = {
  text: 'Hello from IBM Watson',
  accept: 'audio/wav' //ogg; codec=opus'
};

var stream = text_to_speech.synthesize(params)

reader.on('format', function(format) {
    console.log('format', format);
    reader.pipe(new Speaker(format)).on('error', function(err) {console.error(err.stack);});
});

stream.pipe(reader);

But it seems to fail on both the pi and the mac with the same error:

$ node player.js 
format { audioFormat: 1,
  endianness: 'LE',
  channels: 1,
  sampleRate: 48000,
  byteRate: 96000,
  blockAlign: 2,
  bitDepth: 16,
  signed: true }
stream.js:94
      throw er; // Unhandled stream error in pipe.
            ^
Error: bad "data" chunk: expected "data", got "LIST"
    at Reader._onSubchunk2ID (/Users/nfriedly/demo-watson/node_modules/wav/lib/reader.js:135:24)
    at process (/Users/nfriedly/demo-watson/node_modules/wav/node_modules/stream-parser/lib/parser.js:237:20)
    at /Users/nfriedly/demo-watson/node_modules/wav/node_modules/stream-parser/lib/parser.js:172:14
    at /Users/nfriedly/demo-watson/node_modules/wav/node_modules/stream-parser/lib/parser.js:261:16
    at Reader.transform [as _transform] (/Users/nfriedly/demo-watson/node_modules/wav/node_modules/stream-parser/lib/parser.js:146:3)
    at Reader.Transform._read (_stream_transform.js:179:10)
    at Reader.Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:301:12)
    at writeOrBuffer (_stream_writable.js:288:5)
    at Reader.Writable.write (_stream_writable.js:217:11)

Any thoughts?

nfriedly commented 9 years ago

Oh, and yea, those were throwaway credentials that I made just for this issue. They won't live long ;)

TooTallNate commented 9 years ago

That is strange... your revised version works for me as well on OS X. Can you upload the .wav file that you're getting generated?

nfriedly commented 9 years ago

Oh, and I just noticed that the pi is on node 0.12.0. Let me update that and report back.... (sorry if that turns out to be the issue.

nfriedly commented 9 years ago

OK, here's the wav files that I get. out-short is when I pipe to both the reader & the fs (it's shorter because the app crashes before it finishes) and out-full is from commenting out the wav reader and just piping to the fs: https://www.dropbox.com/sh/xj3iinxq5f4cc89/AAC5YIEZgUmd-Rmv0IGLqb2oa?dl=0

My pi's now on node 0.12.2 and I rebuilt the dependencies, and the original example now plays (with some awful scratching at the start). So that was my first issue. The bad chunk error is still popping up if I use the node-wav version though :(

nfriedly commented 9 years ago

If you want to try it out on my pi, send me an email and I'll open things up so that you can SSH into it: nathan@<my github username>.com

TooTallNate commented 9 years ago

Can you try out wav v1.0.0? I just realized that there were a bunch of commits on master branch that weren't in the latest npm release.

nfriedly commented 9 years ago

That fixed it on my macbook and it's progress on the pi - on my macbook I get 2-3 lines like this:

[../deps/mpg123/src/output/coreaudio.c:81] warning: Didn't have any audio data in callback (buffer underflow)

but the audio plays smoothly. On the pi, I don't get any log output, but it plays very scratchy :/

TooTallNate commented 9 years ago

On the pi, with the wav file saved to disk, how does it sound?

nfriedly commented 9 years ago

It sounds perfect when reading from from the disk

TooTallNate commented 9 years ago

How's your internet connection? I don't get the buffer underflows on OS X, and it sounds like that's what's happening on Linux as well for you.

On Fri, May 1, 2015 at 12:14 PM, Nathan Friedly notifications@github.com wrote:

It sounds perfect when reading from from the disk

— Reply to this email directly or view it on GitHub https://github.com/TooTallNate/node-speaker/issues/51#issuecomment-98209392 .

nfriedly commented 9 years ago

Yea.. that might be part of the problem. I'm out in the coutryside on a crapy 6mbit DSL line. I can tether my macbook to my phone and get maybe 18-20mbit over LTE, but I don't have a wifi dongle on for my pi :(

nfriedly commented 9 years ago

Do you know of any way to make a stream buffer a little bit extra before emitting the first 'data' or 'readable' event?

TooTallNate commented 9 years ago

Perhaps check out https://github.com/samcday/node-stream-buffer

TooTallNate commented 9 years ago

Actually, that module doesn't seem like what you're looking for… I'm not really sure of a stream that does in-memory buffering up to a threshold. I'm sure there's one out there though.

nfriedly commented 9 years ago

Yea, I had the same conclusion. I made this and tried it a couple of different ways but no success:

var util = require('util');
var Transform = require('stream').Transform;
util.inherits(StreamBuffer, Transform);

function StreamBuffer(options) {
    if (!(this instanceof StreamBuffer))
        return new StreamBuffer(options);
    options = options || {};
    this.buffer = [];
    this.bufferLength = options.size || 16; // todo: switch this to a data length rather than a number of chunks
    options.highWaterMark = this.bufferLength * 1024;
    this.buffering = true;

    Transform.call(this, options);

    this.realWrite = this.write;
    this.write = function(data) {
        this.realWrite(data);
        return true; // keep asking for data even after we hit the highWaterMark
    }

}

StreamBuffer.prototype._transform = function(chunk, encoding, done) {
    // until the buffer is full, just hold onto incoming data
    if (this.buffering) {
        this.buffer.push(chunk);
        if (this.buffer.length >= this.bufferLength) {
            this.flush();
        }
    } else {
        // and from here on, just send data down the line immediately
        this.push(chunk);
    }
    done();
};

// once the buffer is full (or the end of the input is reached), dump everything
// separate from _flush because we're not allowed to call_flush directly - it may be overridden in some cases
StreamBuffer.prototype.flush = function() {
    this.buffering = false;
    while(this.buffer.length) {
        this.push(this.buffer.shift());
    }
};

StreamBuffer.prototype._flush = function(callback) {
    this.flush();
    callback();
};
TooTallNate commented 9 years ago

It's an interesting problem, because a proper "buffering stream" would use information like input/write speed (internet speed) since minimal/no buffering would be required in my case. Overall filesize, and file format of the resulting stream could also be factors (but probably represented in a more generic API). Could be a fun weekend project :smile:

I think my stream-parser module could be useful for that…

g00dnatur3 commented 8 years ago

I've worked with watson before, you should not use the wav option, instead decode and play the ogg format, look here:

IN THIS EXAMPLE REPLACE "aplay" with the speaker, ty

var opus = require('node-opus');
var ogg = require('ogg');
var cp = require('child_process');
var request = require("request");

function createOggDecoder() {
    var oggDecoder = new ogg.Decoder();
    oggDecoder.on('stream', function (stream) {
        var opusDecoder = new opus.Decoder();
        // the "format" event contains the raw PCM format
        opusDecoder.on('format', function (format) {
            // convert the signed & bitDepth to an alsa compatible format (`aplay --help format` for full list)
            var alsaFormat;
            if (format.signed && format.bitDepth == 16) {
                alsaFormat = 'S16_LE'; // assume Little Endian
            } else {
                throw new Error('unexpected format: ' + JSON.stringify(format));
            }
            // set up aplay to accept data from stdin
            var aplay = cp.spawn('aplay',['--format=' + alsaFormat, '--rate=' + format.sampleRate, '--channels='+format.channels, '--']);
            // send the raw PCM data to aplay
            opusDecoder.pipe(aplay.stdin);
            // or pipe to node-speaker, a file, etc
        });
        // an "error" event will get emitted if the stream is not a Vorbis stream
        // (i.e. it could be a Theora video stream instead)
        opusDecoder.on('error', console.error);
        stream.pipe(opusDecoder);
    });
    return oggDecoder;
}

var getTextToSpeechUrl = function(text) {
  var options = {
    text: text,
    voice: 'en-US_AllisonVoice'
  }
  var downloadURL = 'http://127.0.0.1:3001/api/synthesize' +
    '?voice=' + options.voice +
    '&text=' + encodeURIComponent(options.text)
  return downloadURL;
};

function TextToSpeech() {
    this.say = function(text) {
        var oggDecoder = createOggDecoder();
        request(getTextToSpeechUrl(text)).pipe(oggDecoder);
    }
}

module.exports = TextToSpeech;
kiriri commented 9 months ago

Hello spirits from the past. This error occurs on linux to this day anytime a buffer is sent to the audio driver which is not divisible by the stride the driver expects.

speaker writing 1019 byte chunk +25ms speaker wrote 1018 bytes +71ms

Presumably the bit depth is 16. But the OP sent 1019 bytes. "That's not ok!" says the driver and silently drops the last byte. The solution to this problem is to detect rejected bytes and concat them to the next chunk you want to send. The following is a caveman interpretation of this, using speaker.write instead of pipe. I specifically wrote and tested it on bit depth 16 only.

async function playWavFromUrl(url) {
    try {
        // Use axios stream
        const response = await axios({
            method: 'GET',
            url,
            responseType: 'stream',
        });

        // Create a Speaker instance
        const speaker = new Speaker({
            bitDepth: 16,
            sampleRate: 24000,
            channels: 1,
        });

        let is_first = true;
        let remainder = new Buffer([]);

        response.data.on('data', 
        /**
         * @param {Buffer} chunk 
         */
        (chunk) => 
        {
            // In my case the streaming response includes a wav header.
            // We need to skip it, or else le pop occurs.
            if(is_first)
            {
                let header = parseWavHeader(chunk); // You can replace this by either 44 or 46 most of the time.
                chunk = chunk.slice(header.length);
                is_first = false;
            }

            // If the last chunk was odd, we need to add the remaining bit to the current chunk.
            if(remainder.length > 0)
            {
                chunk = Buffer.concat([remainder, chunk]);
                remainder = new Buffer([]);
            }

            // Is the current chunk odd? If so, we make it even and remember the remaining byte for the next chunk.
            if (chunk.length % 2 !== 0) {
                remainder = chunk.slice(-1);
                chunk = chunk.slice(0, -1);
            }

            speaker.write(chunk);
        });

        // Handle end of stream
        response.data.on('end', async () => {
            // If the entire stream was odd, we just forget about that last bit.
            console.log('Audio playback finished.');
        });

        speaker.on('error', (err) => {
            console.error('Error playing audio:', err.message);
        });
    } catch (error) {
        console.error('Error fetching WAV file:', error.message);
    }
}

It would probably be better to fix this in a PR, but those don't seem to get looked at anymore, so I won't bother.