Can this project be used to train dialects？

Absolutely. For the purposes of this project, we can consider each dialect akin to a distinct language. Regarding text tokenization, there is no need for you to develop this component from scratch; please refer to the guide I've provided at https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages/blob/3f8a0c5f0efa3eb10c12074d2433f1e754087c60/Readme.md#4-vocabulary-extension-and-configuration-adjustment

Based on my experience, a minimum of 50 hours of audio data is recommended to achieve satisfactory results when fine-tuning a single dialect.

anhnh2002 / XTTSv2-Finetuning-for-New-Languages