TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2
Error preparation on 16 KHz data #558

junaedifahmi commented 3 years ago

Hi, I have a dataset with 16 kHz and 22 kHz, I did the preparation with ljspeech_preparation.yaml config with 22KHz, it works fine. But when I try to do the same dataset with a different sample rate it produces an error while doing compute statistics. I make sure that the sample rate on the config is appropriate for the dataset.

 ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by StandardScaler. 

this is the config file

#                FEATURE EXTRACTION SETTING               #                                                                                                      
sampling_rate: 16000     # Sampling rate.                                                                                                                         
fft_size: 512           # FFT size.                                                                                                                               
hop_size: 256            # Hop size. (fixed value, don't change)                                                                                                 
win_length: null         # Window length.                                                                                                                                                # If set to null, it will be the same as fft_size.                                                                                       
window: "hann"           # Window function.                                                                                                                       
num_mels: 80             # Number of mel basis.                                                                                                                   
fmin: 80                 # Minimum freq in mel basis calculation.                                                                                                 
fmax: 8000               # Maximum frequency in mel basis calculation.                                                                                           
global_gain_scale: 1.0   # Will be multiplied to all of waveform.                                                                                                
trim_silence: true       # Whether to trim the start and end of silence.                                                                                         
trim_threshold_in_db: 60 # Need to tune carefully if the recording is not good.                                                                                  
trim_frame_size: 2048    # Frame size in trimming.                                                                                                               
trim_hop_size: 512       # Hop size in trimming.                                                                                                                  
format: "npy"            # Feature file format. Only "npy" is supported.  

Thank you for your answer.

dathudeptrai commented 3 years ago

@juunnn is the bug still exist ?

junaedifahmi commented 3 years ago

yeah. I use version 0.0 that I have installed before it works fine. with the newer version like 0.9 or 1.1, the problem persists.

dathudeptrai commented 3 years ago

@juunnn 0.0 is a master branch and it's a newest version :D. Pls pull the newest code :D .

junaedifahmi commented 3 years ago

I installed from pip for 0.9 version, but I build from github for the 0.0 version. Is that have any effect for the performance?