Stitching not available

elevenlabs / elevenlabs-js

The official JavaScript (Node) library for ElevenLabs Text to Speech.

https://elevenlabs.io

MIT License

101 stars 11 forks source link

Stitching not available #64

Open tval2 opened 1 month ago

tval2 commented 1 month ago

I am trying to keep my app low latency and hence am using the streaming api via this node library. I also want to maintain decent flow between my chunks that I am sending in (I am splitting on new sentences) and hence need to stitch them together using the request_ids of previous streams.

However, upon looking into this further it seems that client.generate has no way of also returning the request_id as an optional output and therefore I can't even use this feature at all. I was told in the discord to try the HTTP API but the output format of the HTTP API differs from the SDK, so I need to make other changes in my codebase simply because swapped out how I'm calling the API.

I tried looking in the source code here to see if I could mimic the output of the SDK but it doesn't seem to work. So figured I'd post here asking for a PR where you output the request_id if asked as well.

dsinghvi commented 1 month ago

@tval2 have you considered using textToSpeech.convert ?

tval2 commented 1 month ago

@tval2 have you considered using textToSpeech.convert ?

Maybe i'm missing something but my understanding was that convert only returns the response body, no? At least that's what it shows in the repo.

tval2 commented 1 month ago

@dsinghvi perhaps more broadly: is there any clean way to use the SDK and pass the previous_request_ids parameter at the same time? The generate call allows for it so I still feel like I'm missing something

tval2 commented 1 month ago

Also on this topic, a contradictory guideline from the API reference:

In the API docs it says In case both previous_text and previous_request_ids is send, previous_text will be ignored.
In this tutorial it says best possible results are achieved when conditioning both on text and past generations so lets combine the two by providing previous_text, next_text and previous_request_ids in one request

Which one should we follow? Should I send previous_text or no?

dsinghvi commented 1 month ago

@tval2 you should be able to use textToSpeech.convertAsStream for the streaming api. Additionally that method supports both parameters that you mention such as previous_request_ids

ofekrom commented 1 month ago

I am also encountering the same issue reported in the original message

ceifa commented 2 weeks ago

@tval2 you should be able to use textToSpeech.convertAsStream for the streaming api. Additionally that method supports both parameters that you mention such as previous_request_ids

It's not possible to get the request id from textToSpeech.convertAsStream