Pattern_Generator.py 실행 오류 문의 (AIHUB 다화자 가창 데이터)

MuHyeonSon commented 4 weeks ago

안녕하세요.

AIHUB 다화자 가창 데이터로 Pattern_Generator.py 를 실행하면서 문제가 발생하여 이슈를 남기게 되었습니다.

Pattern_File_Generate 를 실행하면서 모든 데이터가 audio, midi 길이가 incompatible 한 문제를 발견하였습니다.

WARNING:root:'AIHub_Mediazen-S01-ba_05567_+4_a_s01_f_02' is skipped because the audio and midi length incompatible.

데이터는 AIHUB 다화자 가창 데이터에서 직접 모든 디렉토리 및 파일을 다운받아 아래의 경로에 압출을 풀어 넣어 주었습니다.

DiffSingerKR/D:/Datasets/rawdata_music

감사합니다.

CODEJIN commented 3 weeks ago

안녕하세요. close를 하셨는데 혹시 이슈가 해결되셨나요?

MuHyeonSon commented 3 weeks ago

안녕하세요. 우선 에러의 원인을 파악하여 코드를 수정하였고, 학습이 진행되는 것을 확인하였습니다.

하지만 2가지 질문을 드리고 싶습니다.

eval 과정 중에 tensorboard에서 확인되는 target 오디오를 들어보면 목소리가 조금씩 끊기는 현상이 있습니다. 원래 끊기는 것인지, 아니면 문제가 있는 것인지 궁금합니다. 만약 문제가 있다면, 제가 수정한 코드를 통해 생성된 패턴들의 문제가 있어 발생한 것이라고 추측하고 있습니다.

아래의 링크는 eval 과정 중 확인할 수 있는 target audio 입니다. https://drive.google.com/file/d/1mgMdp18UCMpIwZntfSrJ1JsOya3-0lSv/view?usp=drive_link https://drive.google.com/file/d/16RjGYkvHT_YKZ2CtQQxHwlYl8v2Zui1g/view?usp=drive_link
해당 프로젝트를 진행하실 때 사용하신 패키지(requirements.txt 에 기재된)들의 버전과, GPU, CUDA버전을 문의드리고 싶습니다.

에러에 대해서 제가 해결한 방법에 대해 공유드립니다.

위 에러가 발생한 원인은 meldataset.py 에 함수에 있었습니다. Pattern_Generator.py 를 실행할 때, 아래와 같은 에러가 발생하여, 이를 해결하기 위해 meldataset.py의 3가지 함수(mel_spectrogram, spectrogram, spec_energy)의 코드를 수정하였고, 수정한 부분으로 인해 issue에 설명드린 오류가 발생하였습니다.

Traceback (most recent call last):
  File "Pattern_Generator.py", line 643, in <module>
    AIHub_Mediazen(
  File "Pattern_Generator.py", line 135, in AIHub_Mediazen
    Pattern_File_Generate(
  File "Pattern_Generator.py", line 342, in Pattern_File_Generate
    spect = spectrogram(
  File "/home/muhyeonson/workspace/SVS/DiffSingerKR/meldataset.py", line 110, in spectrogram
    spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[str(y.device)],
  File "/home/muhyeonson/miniconda3/envs/diffsingerkr/lib/python3.8/site-packages/torch/functional.py", line 666, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

하지만 최종적으로 아래와 같이 다시 수정을 하여, 발생했던 에러를 완전히 해결하였고, 모델 훈련을 진행할 수 있었습니다.

def mel_spectrogram(y, n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax, center=False):
    if torch.min(y) < -1.:
        print('min value is ', torch.min(y))
    if torch.max(y) > 1.:
        print('max value is ', torch.max(y))

    global mel_basis, hann_window
    if fmax not in mel_basis:
        #mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax) # Create a Mel filter-bank.
        mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
        mel_basis[str(fmax)+'_'+str(y.device)] = torch.from_numpy(mel).float().to(y.device) 
        hann_window[str(y.device)] = torch.hann_window(win_size).to(y.device)

    y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
    y = y.squeeze(1)
    spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[str(y.device)],
                      center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=True) #torch 버전 때문에 return_complex=True 추가
    spec = spec.abs() # 추가한 코드
    #spec = torch.sqrt(spec.pow(2).sum(-1)+(1e-9))

    spec = torch.matmul(mel_basis[str(fmax)+'_'+str(y.device)], spec)
    spec = spectral_normalize_torch(spec)

    return spec

def spectrogram(y, n_fft, hop_size, win_size, center=False):
    if torch.min(y) < -1.:
        print('min value is ', torch.min(y))
    if torch.max(y) > 1.:
        print('max value is ', torch.max(y))

    global hann_window
    hann_window[str(y.device)] = torch.hann_window(win_size).to(y.device)

    y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
    y = y.squeeze(1)

    spec_torch = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[str(y.device)],
                      center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=True)
    spec_torch = spec_torch.abs() # 추가한 코드
    #spec_torch = torch.sqrt(spec_torch.pow(2)+(1e-9)) # 제거한 코드
    spec_torch = spectral_normalize_torch(spec_torch)

    return spec_torch

def spec_energy(y, n_fft, hop_size, win_size, center=False):
    if torch.min(y) < -1.:
        print('min value is ', torch.min(y))
    if torch.max(y) > 1.:
        print('max value is ', torch.max(y))

    global hann_window
    hann_window[str(y.device)] = torch.hann_window(win_size).to(y.device)

    y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
    y = y.squeeze(1)

    spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[str(y.device)],
                      center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=True)
    spec = spec.abs() # 추가한 코드
    #spec = torch.sqrt(spec.pow(2).sum(-1)+(1e-9))
    energy = torch.norm(spec, dim= 1)

    return energy

#===========================================================================
    # Pattern_Generate.py 파일의 spec_energy 함수 리턴 값 처리
    log_energy = spec_energy(
        y= torch.from_numpy(audio).float().unsqueeze(0),
        n_fft= hyper_paramters.Sound.N_FFT,
        hop_size= hyper_paramters.Sound.Frame_Shift,
        win_size=hyper_paramters.Sound.Frame_Length,
        center= False
        ).squeeze(0)#.log().numpy()
    # log를 취하기 전에 작은 값을 더해줌으로써 안정성 확보
    log_energy = (log_energy + 1e-9).log().numpy()

감사합니다.

CODEJIN / DiffSingerKR

Pattern_Generator.py 실행 오류 문의 (AIHUB 다화자 가창 데이터) #3