EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
https://docs.everyvoice.ca
Other
23 stars 2 forks source link

Cleaners and to-replace should also be dataset-specific #359

Open roedoejet opened 8 months ago

roedoejet commented 8 months ago

I still think we should have cleaners defined on the everyvoice.config.text_config.TextConfig but we should rename them to global_cleaners and global_to_replace. There are some cleaners/to_replace rules that only apply to certain datasets, and those should be defined on everyvoice.config.preprocessing_config.Dataset.

In addition to adding the cleaners here, we also need to:

MENGZHEGENG commented 8 months ago

It seems that we don't have to-replace supported in the wizard. Will take a look at this.

roedoejet commented 8 months ago

It seems that we don't have to-replace supported in the wizard. Will take a look at this.

I think that's fine. It's a bit advanced, and there isn't an obvious way (to me) to create the interaction in the wizard. I think it's alright if we just document it in the docs and tell people to adjust the configuration file if necessary.

MENGZHEGENG commented 8 months ago

It may confuse the user to set global_cleaner and dataset-specific_cleaner separately, while I totally agree that we should set these two cleaners. How about we set global_cleaner to collapse_white_space by default (in everyvoice.config.text_config.TextConfig), and ask the user to set the dataset-specific cleaners (in everyvoice.config.preprocessing_config.Dataset)?

roedoejet commented 8 months ago

It may confuse the user to set global_cleaner and dataset-specific_cleaner separately, while I totally agree that we should set these two cleaners. How about we set global_cleaner to collapse_white_space by default (in everyvoice.config.text_config.TextConfig), and ask the user to set the dataset-specific cleaners (in everyvoice.config.preprocessing_config.Dataset)?

Good idea!