Kagami / vmsg

:musical_note: Library for creating voice messages
https://kagami.github.io/vmsg/
Creative Commons Zero v1.0 Universal
348 stars 58 forks source link

Streaming encoder #23

Open onel opened 5 years ago

onel commented 5 years ago

First of all, thanks for this great library.

I have a question: is there a way to do encoding of a specific audio buffer and only get that back, and not the whole recording? For example, sending a Float32Array, vmsg encodes it and then sends it back. Right now I think during a recording, everything is held in memory and returned when calling vmsg_flush(). This would be useful for longer recordings where you want to encode something and maybe upload it and not keep it in memory.

I've tried to do something similar, by calling vmsg_init, vmsg_encode and then vmsg_flush, inside the data event listener for the worker. I don't think this is the right way to do it.

  case "data":

    if (!vmsg_init(msg.rate)) return postMessage({type: "error", data: "vmsg_init"});

    if (!vmsg_encode(msg.data)) return postMessage({type: "error", data: "vmsg_encode"});

    const blob = vmsg_flush();
    if (!blob) {
      return postMessage({type: "error", data: "vmsg_flush"});
    }

    postMessage({
      type: "blob",
      data: blob
    });

    break;

Is there a way to do that? A change would also need to be made inside vmsg.c, right? Thanks

Kagami commented 5 years ago

Yes, it's possible, just need to make vmsg_encode C function return the number of bytes written, so you can send v->mp3+v->size-n .. v->mp3+v->size bytes via PostMessage to the main thread. At the end you also should fix the lame tag (lame_get_lametag_frame), need additional message for that.

I'm not sure if we want to use that method for normal recordings, because it would require to send every encoded chunk back to the main thread and copy it to the buffer, it might introduce additional delay. But should be ok to make it optional.

onel commented 5 years ago

Ok, I understand. Don't have experience with c but maybe I'll try that in a fork. Thank you so much for the details.

onel commented 5 years ago

Hi there, I took a stab at making this work and I wanted to check with you if this is the right way to do it. I haven't create a PR for this because I don't know if you would want to integrate it. But let me know if you would want that. The idea is that on each buffer we would do vmsg_encode, vmsg_flush and then a new method vmsg_reset. Inside the worker this would look like this:

  case "data":

    if (!vmsg_encode(msg.data)) return postMessage({type: "error", data: "vmsg_encode"});

    const blob = vmsg_flush();
    if (!blob) {
      return postMessage({type: "error", data: "vmsg_flush"});
    }

    postMessage({
      type: "blob",
      data: blob
    });

    FFI.vmsg_reset()

    break;

This will return the blob for that specific buffer each time.

The changes that I've made are: For vmsg_encode the size is returned each time:

WASM_EXPORT
int vmsg_encode(vmsg *v, int nsamples) {
  if (nsamples > MAX_SAMPLES)
    return -1;

  if (fix_mp3_size(v) < 0)
    return -1;

  uint8_t *buf = v->mp3 + v->size;
  int n = lame_encode_buffer_ieee_float(v->gfp, v->pcm_l, NULL, nsamples, buf, BUF_SIZE);

  if (n < 0)
    return n;

  v->size += n;
  return v->size;
}

And the new method:

WASM_EXPORT
int vmsg_reset(vmsg *v, int rate) {
  if (v) {
    lame_close(v->gfp);
    v->size = 0;

    v->gfp = lame_init();
    if (!v->gfp) {
      vmsg_free(v);
      return -1;
    }

    lame_set_mode(v->gfp, MONO);
    lame_set_num_channels(v->gfp, 1);
    lame_set_in_samplerate(v->gfp, rate);
    lame_set_VBR(v->gfp, vbr_default);
    lame_set_VBR_quality(v->gfp, 5);

   if (lame_init_params(v->gfp) < 0) {
     vmsg_free(v);
     return -1;
   }

  }

  return 0;
}

This basically looks like init but without the memory allocation. The problem I'm having is that the resulting mp3 blob is not actually usable. I think in vmsg_reset the encoder is not set up correctly. My questions are: Do you thing this is a good way to do buffer encoding? And, what would you recommend we don in vmsg_reset? Thanks

flieks commented 4 years ago

@onel did you get it working ? i am also interested in this for live speech to text (on the server)

stefan-reich commented 3 years ago

Damn. I want this too. What if we fake it and just swap the encoder with a new one every few seconds? I'm fine with lots of relatively short mp3s.

stefan-reich commented 3 years ago

Ah I think I'll simply use MediaRecorder. It should record as .webm, right?