Hagsten / Talkify

Javascript Text to speech library
216 stars 32 forks source link

How can I get the specific time of each word ? #67

Closed YiYinYinguu closed 1 year ago

YiYinYinguu commented 2 years ago

I need to use the generated audio to create videos. Therefore, I want to know the time start and time end of each word. I would appreciate a reply.

Hagsten commented 2 years ago

You can extract speech marks via https://talkify.net/api/speech/v1/marks endpoint as described at https://manage.talkify.net/docs#api-reference-speech-speech-marks

Example response:

[{"Word":"Proofread","Position":100,"CharPosition":0,"CharPositionOffset":0},{"Word":"your","Position":565,"CharPosition":10,"CharPositionOffset":0}]

Where position is "The starting position (in ms) in the audio stream for the spoken word"

There is no way to get the end of the spoken word, but you can estimate it by checking the start of the next.