DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!
Apache License 2.0
1.47k stars 166 forks source link

Add stereo conversion to mono in UtteranceCloner and a generalized text to file reader with a personalized name according to model_id #190

Closed AlexSteveChungAlvarez closed 2 months ago

AlexSteveChungAlvarez commented 2 months ago

An error appeared when entering a stereo audio in run_prosody_override which was raised from the UtteranceCloner when using silerovad. Commit 2f8ea93 fixes the error and the code is exactly the one used to fix this issue in the run_text_to_file_reader.

In commit 9082cf5 I changed all the language test functions to one function called new_test, where I included a personalized name according to the model_id. This will help new users of the toolkit see their test files with their own model's name without having to create a new function for their model and target language.

AlexSteveChungAlvarez commented 2 months ago

@Flux9665 did you see the pull request?

Flux9665 commented 2 months ago

Hi, thanks for the fix! I am attending a conference right now and the hotel wifi is terrible, I will check everything next week when I get back.

Flux9665 commented 2 months ago

[accept] The fix in https://github.com/DigitalPhonetics/IMS-Toucan/commit/2f8ea9309cadb8535163a5e4dbaf97bfef2dfc23 is necessary, thank you for this!

[change required] The change of the file-reader demo in https://github.com/DigitalPhonetics/IMS-Toucan/commit/9082cf5bb0377c2f4d6c5a16c29d2ae35ff2c435 does however not fit well with the philosophy on redundancy of the toolkit and makes the code less readable. The model_id is already part of the arguments of the function, I would prefer it if we just put the model_id into the filename of the existing functions, rather than make a more abstract function.

AlexSteveChungAlvarez commented 2 months ago

@Flux9665 I did the change.