EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
https://docs.everyvoice.ca
Other
20 stars 2 forks source link

Add model configuration validation to anticipate out-of-memory errors #150

Open roedoejet opened 10 months ago

roedoejet commented 10 months ago

something that checks that this number isn’t too big: n_mels * 1/(fft_hop_frames / input_sampling_rate) maybe also in relation to max_audio_length

roedoejet commented 10 months ago

each batch is B T K Where B is batch size and is set to 16 by default T is time in frames. To calculate the number of frames in a Mel Spectrogram in one second we can do 1/(fft_hop_frames/input_sampling_rate). K is the number of Mel bins, by default it's 80

So if you set fft_hop_frames really low without changing the sampling rate, you'll get memory errors potentially.

roedoejet commented 10 months ago

Worst case: T = max_wav_length / (fft_hop_frames / input_sampling_rate) batch_size T n_mels

SamuelLarkin commented 10 months ago

@roedoejet Is the worst case missing * n_mels? Looking at https://github.com/roedoejet/EveryVoice/issues/150#issuecomment-1804492605, the formula is B * T * K where K is n_mels which is missing from the worst case's equation. Worst case: batch_size * max_wav_length / (fft_hop_frames / input_sampling_rate)

Should the worst case be: batch_size * max_wav_length / (fft_hop_frames / input_sampling_rate) * n_mels?

roedoejet commented 10 months ago

yes! sorry - I shouldn't have written that while talking in the meeting! good catch! I'll edit the comment in case we come across it again