请问支持字幕输出功能吗？

WhiteMagic2014 / tts-edge-java

java sdk for Edge Read Aloud

https://server.whitemagic2014.com/tts/

MIT License

38 stars 14 forks source link

请问支持字幕输出功能吗？ #3

Closed yzygenuine closed 7 months ago

WhiteMagic2014 commented 7 months ago

这个项目本身不支持。不过可以结合用openai的createTranscription 根据音频获得字幕文件(srt)

具体调用可以参考我这个项目gpt-magic

yzygenuine commented 7 months ago

gpt-magic

python的edge-tts可以实现字幕的。

`async def main() -> None: """Main function""" communicate = edge_tts.Communicate(TEXT, VOICE) submaker = edge_tts.SubMaker() with open(OUTPUT_FILE, "wb") as file: async for chunk in communicate.stream(): if chunk["type"] == "audio": file.write(chunk["data"]) elif chunk["type"] == "WordBoundary": submaker.create_sub((chunk["offset"], chunk["duration"]), chunk["text"])

with open(WEBVTT_FILE, "w", encoding="utf-8") as file:
    file.write(submaker.generate_subs())

` 按道理应该是一样的，就是对返回的结果做了一次处理

WhiteMagic2014 commented 7 months ago

gpt-magic

python的edge-tts可以实现字幕的。

`async def main() -> None: """Main function""" communicate = edge_tts.Communicate(TEXT, VOICE) submaker = edge_tts.SubMaker() with open(OUTPUT_FILE, "wb") as file: async for chunk in communicate.stream(): if chunk["type"] == "audio": file.write(chunk["data"]) elif chunk["type"] == "WordBoundary": submaker.create_sub((chunk["offset"], chunk["duration"]), chunk["text"])
with open(WEBVTT_FILE, "w", encoding="utf-8") as file:
    file.write(submaker.generate_subs())
` 按道理应该是一样的，就是对返回的结果做了一次处理

好啦，更新1.2.0了，按照python的功能写了一下

lllsondowlll commented 1 month ago

I think this change broke general use cases. Each time I have it transcribe something it adds to the vtt and mp3 instead of overwriting it which means for a real time voice transcription, it is taking the last session, playing it back, then playing the current content I asked it to generate additively. This continues growing.

WhiteMagic2014 commented 1 month ago

I think this change broke general use cases. Each time I have it transcribe something it adds to the vtt and mp3 instead of overwriting it which means for a real time voice transcription, it is taking the last session, playing it back, then playing the current content I asked it to generate additively. This continues growing.

Thank you for pointing out this issue. When the same file name is specified, there was indeed a bug where the audio file would be appended, but the VTT subtitle file would be overwritten. I have fixed this issue in the recently released version 1.2.3.