请教一下关于speech的问题

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

779 stars 61 forks source link

Closed silvercherry closed 3 months ago

silvercherry commented 3 months ago

首先非常感谢您对开源社区的贡献，这是一篇非常好的工作。想请教下您在speechtokenizer后使用了几层的tokenizer；另外想请教一下SoundStormTrainer和Semantic2AcousticTrainer的区别是什么呢，这两部分是用来训练语音的tokenizer嘛

JunZhan2000 commented 3 months ago

speechtokenizer本身是8层的，我们用LLM建模第一层，用soundstorm建模剩余7层
这个是soundstorm训练代码的问题么，可以到那个库提issue：https://github.com/ZhangXInFD/soundstorm-speechtokenizer