Low Quality - Githubissues

0nutation / USLM

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

138 stars 11 forks source link

python3 bin/infer.py --output-dir ${out_dir}/ \ --model-name USLM --norm-first true --add-prenet false \ --share-embedding true --norm-first true --add-prenet false \ --audio-extractor "${audio_extractor}" \ --speechtokenizer-dir "${st_dir}" \ --checkpoint=${uslm_dir}/USLM.pt \ --text-tokens "${uslm_dir}/unique_text_tokens.k2symbols" \ --text-prompts "The rainbow is a division of white light into many beautiful colors." \ --audio-prompts prompts/prompt.wav \ --text "She also defended the lord chancellors existing powers." \

Hi, thanks for sharing you work. I am using the following command to generate audio on the same text as in your demo using the same audio prompt. I am getting a bad audio.
python3 bin/infer.py --output-dir ${out_dir}/ \
    --model-name USLM --norm-first true --add-prenet false \
    --share-embedding true --norm-first true --add-prenet false \
    --audio-extractor "${audio_extractor}" \
    --speechtokenizer-dir "${st_dir}" \
    --checkpoint=${uslm_dir}/USLM.pt \
    --text-tokens "${uslm_dir}/unique_text_tokens.k2symbols" \
    --text-prompts "The rainbow is a division of white light into many beautiful colors." \
    --audio-prompts prompts/prompt.wav \
    --text "She also defended the lord chancellors existing powers." \
the prompt is prompt.wav file and the generated audio is gen_prombt.wav here is the audio files: https://drive.google.com/drive/folders/1QyPS3Sl87SjSOFpgBSGHKiAqoA5DW45F?usp=sharing

Hello, I'd like to ask if you have retrained the USLM?

0nutation / USLM

Low Quality #4