C0untFloyd / bark-gui

🔊 Text-Prompted Generative Audio Model with Gradio
MIT License
660 stars 60 forks source link

[Suggestion] Support for .srt #26

Closed Dragoy closed 8 months ago

Dragoy commented 1 year ago

Support for .srt It would be cool to have support for .srt format as text, which would be voiced depending on the timings.

Dragoy commented 1 year ago

Or have support for the prosody tag in SSML: `<?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US">

Hello, I'm Usung Kang. Welcome to the CINEMA 4D Master Class on Achieving Mastery by Gurum Kim and Taehoon Pak. First of all, I'm here to say hello on behalf of both of you, and I'm very happy and excited to meet you, even though it's online. We've done some basic courses through ColloSo in the past, and we're very excited to put this new course together and meet you again. In this first hour, we divided the OT time into five parts. In introduction of the three instructors, introduction of the work involved, key points of the class and introduction to the curriculum At the end of the OT the sun will set and the three instructors will go through all five parts.
</speak>`
Dragoy commented 1 year ago

This tool is ideal for converting .srt to ssml format: https://github.com/ThioJoe/SRT-To-SSML

Dragoy commented 1 year ago

Now, unfortunately, this format does not work and causes an error:

image
C0untFloyd commented 1 year ago

Yes, I will continue working on this format, however Bark is not the best for exact tts, as every result can be quite different and speech breaks are more or less random. Do you know if one could add custom metadata (like Seedvalue) to the SRT-STandard?