NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
5.11k stars 1.39k forks source link

Not getting alignment properly #628

Open hongseoi opened 6 months ago

hongseoi commented 6 months ago

Hi! I trained tacotron2 more than 60000 steps but I cannot get alignment properly. The alignment graph is as follows. Does anyone know the cause of this?

alignment chart

I'm training using 100 samples of elderly voice data selected from the common voice dataset.

Training performance was not good in previous attempts, so I looked for other issues.

But sadly it didn't work.

hongseoi commented 5 months ago

Use Sox to remove silence in the audio file. It's not yet a complete success, but some improvements have been made.

image

import subprocess
import os
import glob

def remove_silence(input_file, output_file):
    try:
        # sox
        subprocess.run([
            'sox', input_file, output_file, 'silence', '2', '0.1', '1%', 'reverse', 'silence', '2', '0.1', '1%', 'reverse'
        ], check=True)
        print(f'Successfully removed silence from {input_file} and saved to {output_file}')
    except subprocess.CalledProcessError as e:
        print(f'Error occurred: {e}')

def process_folder(input_folder, output_folder):
    # mkdir output folder
    os.makedirs(output_folder, exist_ok=True)

    # process all of the wav files in the input_folder
    for wav_file in glob.glob(os.path.join(input_folder, '*.wav')):
        file_name = os.path.basename(wav_file)
        output_wav = os.path.join(output_folder, file_name)
        remove_silence(wav_file, output_wav)

input_folder = '~/data/train'
output_folder = '~/data/processed_train'

process_folder(input_folder, output_folder)
hongseoi commented 5 months ago

screenshot

It was a really simple problem

hongseoi commented 5 months ago

https://www.semanticscholar.org/reader/57c38167e0fa7c045c7fa6d9783216c7d725f6ad