TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

Error preparation on 16 KHz data #558

Closed junaedifahmi closed 3 years ago

junaedifahmi commented 3 years ago

Hi, I have a dataset with 16 kHz and 22 kHz, I did the preparation with ljspeech_preparation.yaml config with 22KHz, it works fine. But when I try to do the same dataset with a different sample rate it produces an error while doing compute statistics. I make sure that the sample rate on the config is appropriate for the dataset.

2021-05-06 14:40:21.164148: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-06 14:40:23,212 (preprocess:516) INFO: Computing statistics for 21246 files.                                                                               14%|█████████████████▏                                                                                                     | 3058/21246 [00:16<01:41, 179.89it/s]
Traceback (most recent call last):                                                                                                                               
File "/home/jun/tts/tacotron2/venv/bin/tensorflow-tts-compute-statistics", line 8, in <module>                                                                      sys.exit(compute_statistics())                                                                                                                                  
File "/home/jun/tts/tacotron2/venv/lib/python3.7/site-packages/tensorflow_tts/bin/preprocess.py", line 531, in compute_statistics                                   scaler_f0.partial_fit(f0[f0 != 0].reshape(-1, 1))                                                                                                               File "/home/jun/tts/tacotron2/venv/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 768, in partial_fit                                            force_all_finite='allow-nan', reset=first_call)                                                                                                                 
File "/home/jun/tts/tacotron2/venv/lib/python3.7/site-packages/sklearn/base.py", line 421, in _validate_data                                                        X = check_array(X, **check_params)                                                                                                                              
File "/home/jun/tts/tacotron2/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f                                                    return f(*args, **kwargs)                                                                                                                                      
 File "/home/jun/tts/tacotron2/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 729, in check_array                                               context))                                                                                                                                                    
 ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by StandardScaler. 

this is the config file

###########################################################                                                                                                      
#                FEATURE EXTRACTION SETTING               #                                                                                                      
###########################################################                                                                                                      
sampling_rate: 16000     # Sampling rate.                                                                                                                         
fft_size: 512           # FFT size.                                                                                                                               
hop_size: 256            # Hop size. (fixed value, don't change)                                                                                                 
win_length: null         # Window length.                                                                                                                                                # If set to null, it will be the same as fft_size.                                                                                       
window: "hann"           # Window function.                                                                                                                       
num_mels: 80             # Number of mel basis.                                                                                                                   
fmin: 80                 # Minimum freq in mel basis calculation.                                                                                                 
fmax: 8000               # Maximum frequency in mel basis calculation.                                                                                           
global_gain_scale: 1.0   # Will be multiplied to all of waveform.                                                                                                
trim_silence: true       # Whether to trim the start and end of silence.                                                                                         
trim_threshold_in_db: 60 # Need to tune carefully if the recording is not good.                                                                                  
trim_frame_size: 2048    # Frame size in trimming.                                                                                                               
trim_hop_size: 512       # Hop size in trimming.                                                                                                                  
format: "npy"            # Feature file format. Only "npy" is supported.  

Thank you for your answer.

dathudeptrai commented 3 years ago

@juunnn is the bug still exist ?

junaedifahmi commented 3 years ago

yeah. I use version 0.0 that I have installed before it works fine. with the newer version like 0.9 or 1.1, the problem persists.

dathudeptrai commented 3 years ago

@juunnn 0.0 is a master branch and it's a newest version :D. Pls pull the newest code :D .

junaedifahmi commented 3 years ago

I installed from pip for 0.9 version, but I build from github for the 0.0 version. Is that have any effect for the performance?