descriptinc / melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
MIT License
980 stars 214 forks source link

input Mel video spectrogram to generate audio #41

Closed Ivvvvvvvvvvy closed 1 year ago

Ivvvvvvvvvvy commented 1 year ago

I found that the generate_from_folder.py file seems to need to start with the .wav file. Is there any way to directly input the mel spectrogram image or mel.npy file

fish0131 commented 1 year ago

@Ivvvvvvvvvvy Can you help me how to solve this problem? Thanks.

Ivvvvvvvvvvy commented 1 year ago

@Ivvvvvvvvvvy Can you help me how to solve this problem? Thanks. 你能帮我解决这个问题吗?谢谢。

1.Place this script in the SpecVQGAN-main\vocoder\scripts directory 2.Go to the SpecVQGAN/vocoder directory,cd [your path]/SpecVQGAN/vocoder 3.Excuting an order:python ./vocoder/scripts/g_wav.py

The script is as follows

import sys, os
sys.path.append(os.path.dirname(sys.path[0]))
import pathlib
temp = pathlib.PosixPath
pathlib.PosixPath = pathlib.WindowsPath
from mel2wav import MelVocoder
import  numpy as np
from pathlib import Path
from tqdm import tqdm
import argparse
import librosa
import cv2
import torch
import soundfile
import os  
import os.path as P

vocoder = MelVocoder('./vocoder/logs/vggsound')

file_path = "D:\\Soundscape\\feature\\melspec_10s_22050hz\\0_mel.npy"
path_list = os.listdir(file_path)
path_name = []  
for i in path_list:
    audio_name = os.path.basename(i).split('.')[0]
    npy_path = P.join(file_path, i)
    spec = np.load(npy_path)
    wave_from_vocoder=vocoder.inverse(torch.from_numpy(spec).to(torch.float32).unsqueeze(0).to('cuda')).cpu().squeeze().detach().numpy() # audio (torch.tensor) -> (batch_size, 80, timesteps)

    output_folder = "D:\\Soundscape\\test"
    os.makedirs(output_folder, exist_ok=True)
    wav_name = P.join(output_folder, audio_name + ".wav")
    soundfile.write(wav_name, wave_from_vocoder, 22050, 'PCM_24')