EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
12 stars 0 forks source link

Synthesize can only process even multiples of the batch size #438

Open roedoejet opened 1 month ago

roedoejet commented 1 month ago

I provided 34 utterances in a filelist with a batch size of 4 and only 32 outputs were produced. It seems that the dataloader only processes full batches.

wiitt commented 2 weeks ago

I've tried synthesizing speech with a specified batch size from a filelist of txt and psv formats (with 31 utterances and batch size=7). For all the utterances I had, I got corresponding audios in both cases. During training, the dataloader does discard incomplete patches.

I developed the sampler which allows filling an incomplete batch with random samples from other batches. Simply training two models following different sampling approaches doesn't make the difference obvious. I restricted the LJ data to only 168 utterances to have a greater significance of discarded data and imitate a case of a low-resource language. I cannot say that one of these models is better than another.

Do you have any ideas how to test the effectiveness of keeping an oversampled last batch?