Closed ghevond20 closed 2 weeks ago
It's there, you passed the wrong flag to espeak:
$ espeak-ng -v hy -q --ipa "Թաղեմ Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։"
tʰaʀˈem kərnˈam ˌapakˈi utˈel jˈev intsˈi ˌanhanɡˈist tʃʰənˈer
It's there, you passed the wrong flag to espeak:
$ espeak-ng -v hy -q --ipa "Թաղեմ Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։" tʰaʀˈem kərnˈam ˌapakˈi utˈel jˈev intsˈi ˌanhanɡˈist tʃʰənˈer
Thanks for answer: [!] Character 'ʰ' not found in the vocabulary. Discarding it. many phonems use this 'ʰ' how to resolve this problem ?
Problem is resolved i add char 'ʰ' in /TTS/tts/utils/text/characters.py on line _pulmonic_consonants = "pbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟʰ" And learn nice ) Thanks for answer ))
Describe the bug
I create custom Armenian dataset format is ljspeech. and get example from train.py use GlowTTS Model Training i change only my dataset path and language name "phoneme_language": "hy"
To Reproduce
import os
Trainer: Where the ✨️ happens.
TrainingArgs: Defines the set of arguments of the Trainer.
from trainer import Trainer, TrainerArgs
GlowTTSConfig: all model related values for training, validating and testing.
from TTS.tts.configs.glow_tts_config import GlowTTSConfig
BaseDatasetConfig: defines name, formatter and path of the dataset.
from TTS.tts.configs.shared_configs import BaseDatasetConfig from TTS.tts.datasets import load_tts_samples from TTS.tts.models.glow_tts import GlowTTS from TTS.tts.utils.text.tokenizer import TTSTokenizer from TTS.utils.audio import AudioProcessor from TTS.tts.utils.text.armenian.phonemizer import ArmenianPhonemizer
we use the same path as this script as our training folder.
output_path = os.path.dirname(os.path.abspath(file))
DEFINE DATASET CONFIG
Set LJSpeech as our target dataset and define its path.
You can also use a simple Dict to define the dataset and pass it to your custom formatter.
dataset_config = BaseDatasetConfig( formatter="ljspeech", meta_file_train="metadata.csv", path=os.path.join(output_path, "/ArmenianGorcakatar") )
INITIALIZE THE TRAINING CONFIGURATION
Configure the model. Every config class inherits the BaseTTSConfig.
config = GlowTTSConfig( batch_size=8, eval_batch_size=16, num_loader_workers=14, num_eval_loader_workers=14, run_eval=True, test_delay_epochs=-1, epochs=1000, text_cleaner="phoneme_cleaners", use_phonemes=True, phoneme_language="hy", phoneme_cache_path=os.path.join(output_path, "phoneme_cache"), print_step=25, print_eval=False, mixed_precision=True, output_path=output_path, datasets=[dataset_config], )
INITIALIZE THE AUDIO PROCESSOR
Audio processor is used for feature extraction and audio I/O.
It mainly serves to the dataloader and the training loggers.
ap = AudioProcessor.init_from_config(config)
INITIALIZE THE TOKENIZER
Tokenizer is used to convert text to sequences of token IDs.
If characters are not defined in the config, default characters are passed to the config
phonemizer = ArmenianPhonemizer() tokenizer, config = TTSTokenizer.init_from_config(config)
LOAD DATA SAMPLES
Each sample is a list of
[text, audio_file_path, speaker_name]
You can define your custom sample loader returning the list of samples.
Or define your custom formatter and pass it to the
load_tts_samples
.Check
TTS.tts.datasets.load_tts_samples
for more details.train_samples, eval_samples = load_tts_samples( dataset_config, eval_split=True, eval_split_max_size=config.eval_split_max_size, eval_split_size=config.eval_split_size, )
INITIALIZE THE MODEL
Models take a config object and a speaker manager as input
Config defines the details of the model like the number of layers, the size of the embedding, etc.
Speaker manager is used by multi-speaker models.
model = GlowTTS(config, ap, tokenizer, speaker_manager=None)
INITIALIZE THE TRAINER
Trainer provides a generic API to train all the 🐸TTS models with all its perks like mixed-precision training,
distributed training, etc.
trainer = Trainer( TrainerArgs(), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples )
AND... 3,2,1... 🚀
trainer.fit()
Expected behavior
No response
Logs
No response
Environment
Additional context
after start train get warning [!] Character 'ʰ' not found in the vocabulary. Discarding it. but when I check my dataset directly from Project$ espeak-ng -vhyw -q -x "Թաղեմ Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։" t#ar"'em g@rn'am ,abag'i ud'el j'ev indz'i ,anhank#'isd tS#@n'er
No such kay 'ʰ'
please help me find the problem where the key is generated 'ʰ' ?