Closed kaen2891 closed 3 months ago
Hello, we have not tried this, but in theory the solution should be very mature, because we directly use the llama2 architecture, so you should be able to directly use the llama2 quantization or compression method
Thank you for replying. If we want to generate only text tokens (ignoring the speech token from AnyGPT), how can we do?
It is very simple, because we also have plain text dialogue data training during training. This type of data has a specific prompt before the start. You can use the prompt to have a dialogue. Refer to https://github.com/OpenMOSS/AnyGPT/blob/main/anygpt/src/m_utils/prompter.py#L19
Thank you for sharing your work.
When I tried to reduce the
max_token_len
to 100 or 200 (default is 500), it was not enough to include all the generated speech tokens, thus we cannot synthesize the waveform.Moreover, generating speech tokens requires a significant amount of time, which can lead to slow evaluations. If we want to reduce the latency time for generating speech tokens, which part do you think we should modify? It would be great if you provide the links to the code.