Hi - First, thank you for sharing your impressive work!
I was able to train a model in another language with similar amount of data as your 100M parameter model. I have this very interesting behavior - If I provide a prompt to the t2s model it only generates the <eos> token and generates an empty wav file (~0.2 sec silence). If I override and set semantic_prompt = [] in infer_text() in transfomer_infer.py it generates pretty good random output but obviously no related to the dub I want to generate. Its a pretty vanilla training run using your code (minor changes to add symbols not present) - just on a different dataset.
Q: Have you run into this and/or do you have ideas on how to fix?
Hi - First, thank you for sharing your impressive work!
I was able to train a model in another language with similar amount of data as your 100M parameter model. I have this very interesting behavior - If I provide a prompt to the t2s model it only generates the
<eos>
token and generates an empty wav file (~0.2 sec silence). If I override and setsemantic_prompt = []
ininfer_text()
intransfomer_infer.py
it generates pretty good random output but obviously no related to the dub I want to generate. Its a pretty vanilla training run using your code (minor changes to add symbols not present) - just on a different dataset.Q: Have you run into this and/or do you have ideas on how to fix?