Closed qo4on closed 4 years ago
Please refer to the following doc for how to deploy inference model: https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/index_en.html
Thanks for your answer. Unfortunately I still can't find the code, which loads the model. There is a C++ example, but I need the same for Python. For example, with Tensorflow I have a checkpoint:
ls {checkpoint_dir}
checkpoint cp.ckpt.data-00001-of-00002
cp.ckpt.data-00000-of-00002 cp.ckpt.index
I can create a model, load weights and run inference:
model = create_model()
model.load_weights(checkpoint_path)
predictions = model(test_input, training=False)
Using PadlePadle
, I have a checkpoint:
step-2000000.pdparams
waveflow_ljspeech.yaml
What is the workflow for .pdparams
and .yaml
files?
I see. You can use load_inference_model to load your model. For more detail, please refer to https://www.paddlepaddle.org.cn/documentation/docs/en/api/io/load_inference_model.html
The code from this link creates three files:
And when loading my checkpoint it looks for __model__
and throws an error because I don't have it:
path = "/downloads/waveflow_res128_ljspeech_ckpt_1.0"
[inference_program, feed_target_names, fetch_targets] = (
fluid.io.load_inference_model(dirname=path, executor=exe))
FileNotFoundError: [Errno 2] No such file or directory: '/downloads/waveflow_res128_ljspeech_ckpt_1.0/__model__'
How can I create a model from these two files?
step-2000000.pdparams
waveflow_ljspeech.yaml
How do you get the pre-trained model. If you save the model with one of the apis save_persistables, save_params or save_vars, then you can load the model using one of load_persistables, load_params, or load_vars.
It is PaddlePaddle official pretrained model. They didn't tell how they saved it.
Please use load_parameters. For example:
I have iteration=0
:
config =
{'batch_size': 8,
'fft_size': 1024,
'fft_window_shift': 256,
'fft_window_size': 1024,
'kernel_h': 3,
'kernel_w': 3,
'learning_rate': 0.0002,
'max_iterations': 3000000,
'mel_bands': 80,
'mel_fmax': 8000.0,
'mel_fmin': 0.0,
'n_channels': 64,
'n_flows': 8,
'n_group': 16,
'n_layers': 8,
'root': './data/LJSpeech-1.1',
'sample_rate': 22050,
'save_every': 10000,
'seed': 1234,
'segment_length': 16000,
'sigma': 1.0,
'test_every': 2000,
'use_fp16': False,
'use_gpu': True,
'valid_size': 16}
class Config():
def __init__(self, **entries):
self.__dict__.update(entries)
config = Config(**config)
from parakeet.models.waveflow import WaveFlowModule
from parakeet.utils import io
model = WaveFlowModule(config)
iteration = io.load_parameters(model, checkpoint_dir="/downloads/waveflow_res128_ljspeech_ckpt_1.0")
iteration
0
If I try checkpoint_path
I have an error:
iteration = io.load_parameters(model, checkpoint_path="/downloads/waveflow_res128_ljspeech_ckpt_1.0/step-2000000.pdparams")
ValueError: invalid literal for int() with base 10: '2000000.pdparams'
You only need to provide the base name of the parameter file, which is step-2000000, no extension name .pdparams or .pdopt is needed.
Closing this issue. If you have any further question, please reopen it.
I still can't make it work. Suppose I have a mel spectrgram (can be copypasted to Colab):
!wget -qqq https://soundfrancisco.com/wp-content/uploads/2019/09/icarfactory.mp3 > /dev/null
audio_pth = "/content/icarfactory.mp3"
import IPython, librosa, librosa.display
y, sr = librosa.load(audio_pth)
# trim silent edges
audio, _ = librosa.effects.trim(y)
librosa.display.waveplot(audio, sr=sr)
IPython.display.Audio(audio_pth)
n_mels = 80
n_fft = 1024
hop_length = 256
S = librosa.feature.melspectrogram(audio, sr=sr, n_fft=n_fft,
hop_length=hop_length,
n_mels=n_mels)
S_DB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_DB, sr=sr, hop_length=hop_length,
x_axis='time', y_axis='mel');
plt.colorbar(format='%+2.0f dB');
S.shape
(80, 679)
Then I load the model:
config =
{'batch_size': 8,
'checkpoint': None,
'checkpoint_dir': '/content/downloads/waveflow_res128_ljspeech_ckpt_1.0',
'fft_size': 1024,
'fft_window_shift': 256,
'fft_window_size': 1024,
'iteration': None,
'kernel_h': 3,
'kernel_w': 3,
'learning_rate': 0.0002,
'max_iterations': 3000000,
'mel_bands': 80,
'mel_fmax': 8000.0,
'mel_fmin': 0.0,
'n_channels': 64,
'n_flows': 8,
'n_group': 16,
'n_layers': 8,
'name': '',
'output': './syn_audios',
'sample': 0,
'sample_rate': 22050,
'save_every': 10000,
'seed': 1234,
'segment_length': 16000,
'sigma': 1.0,
'test_every': 2000,
'use_fp16': True,
'use_gpu': True,
'valid_size': 16}
class Config():
def __init__(self, **entries):
self.__dict__.update(entries)
config = Config(**config)
from parakeet.models.waveflow import WaveFlowModule
from parakeet.utils import io
model = WaveFlowModule(config)
iteration = io.load_parameters(model, checkpoint_dir="/downloads/waveflow_res128_ljspeech_ckpt_1.0")
iteration
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in assign only support float16 in GPU now. (When the type of input in assign is Variable.)
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now.
(input_name, op_name, extra_message))
0
I'm not sure, but iteration=0
means that pretrained model was not loaded, am I right?
Then I add spectrogram S
to batch=1
and synthesize:
model.synthesize(np.expand_dims(S, axis=0))
TypeError Traceback (most recent call last)
<ipython-input-134-f6905ea98f56> in <module>()
----> 1 model.synthesize(np.expand_dims(S, axis=0))
3 frames
/content/Parakeet/parakeet/models/waveflow/waveflow_modules.py in synthesize(self, mel, sigma)
406 """
407 if self.dtype == "float16":
--> 408 mel = fluid.layers.cast(mel, self.dtype)
409 mel = self.conditioner.infer(mel)
410 # From [bs, mel_bands, time] to [bs, mel_bands, n_group, time/n_group]
/usr/local/lib/python3.6/dist-packages/paddle/fluid/layers/tensor.py in cast(x, dtype)
196 x, 'x', Variable,
197 ['bool', 'float16', 'float32', 'float64', 'int32', 'int64', 'uint8'],
--> 198 'cast')
199 out = helper.create_variable_for_type_inference(dtype=dtype)
200 helper.append_op(
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py in check_type_and_dtype(input, input_name, expected_type, expected_dtype, op_name, extra_message)
72 op_name,
73 extra_message=''):
---> 74 check_type(input, input_name, expected_type, op_name, extra_message)
75 check_dtype(input.dtype, input_name, expected_dtype, op_name, extra_message)
76
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py in check_type(input, input_name, expected_type, op_name, extra_message)
80 raise TypeError(
81 "The type of '%s' in %s must be %s, but received %s. %s" %
---> 82 (input_name, op_name, expected_type, type(input), extra_message))
83
84
TypeError: The type of 'x' in cast must be <class 'paddle.fluid.framework.Variable'>, but received <class 'numpy.ndarray'>.
Do you know how I can fix this error?
Please note that io.load_params has no return values. The type of the 'x' parameter (i.e. mel) for cast must be Paddle Variable instead of numpy array. Maybe, you can use paddle.fluid.dygraph.to_variable to convert a numpy array to a Paddle Variable.
Thank you. I managed to run it with no errors. But the output file contains noise only. Do you know what's wrong? This code is based on official example:
from parakeet.models.waveflow import waveflow_modules
from parakeet.utils import io
import paddle.fluid.dygraph as dg
from paddle import fluid
from scipy.io.wavfile import write
from ruamel import yaml
import random
class WaveFlow(waveflow_modules.WaveFlowModule):
def __init__(self, config):
super().__init__(config)
@dg.no_grad
def infer(self, mel):
self.eval()
print(mel.shape)
audio = self.synthesize(mel)
# Denormalize audio from [-1, 1] to [-32768, 32768] int16 range.
audio = audio.numpy().astype("float32") * 32768.0
audio = audio.astype('int16')
filename = 'test.wav'
print(audio.shape, 'audio.shape')
write(filename, config.sample_rate, audio[0])
class Config():
def __init__(self, **entries):
self.__dict__.update(entries)
pth = "/content/Parakeet/examples/waveflow/configs/waveflow_ljspeech.yaml"
with open(pth) as f:
config = yaml.load(f, Loader=yaml.Loader)
config['checkpoint'] = None
config['checkpoint_dir'] = "/content/downloads/waveflow_res128_ljspeech_ckpt_1.0"
config['iteration'] = None
config['name'] = ''
config['output'] = './syn_audios'
config['sample'] = 0
config['use_fp16'] = True
config['use_gpu'] = True
config = Config(**config)
print(config.__dict__)
place = fluid.CUDAPlace(0) if config.use_gpu else fluid.CPUPlace()
with dg.guard(place):
# Fix random seed.
seed = config.seed
random.seed(seed)
np.random.seed(seed)
fluid.default_startup_program().random_seed = seed
fluid.default_main_program().random_seed = seed
model = WaveFlow(config)
# Dry run once to create and initalize all necessary parameters.
dtype = "float16" if config.use_fp16 else "float32"
audio = dg.to_variable(np.random.randn(1, 16000).astype(dtype))
mel = dg.to_variable(
np.random.randn(1, config.mel_bands, 63).astype(dtype))
model(audio, mel)
iteration = io.load_parameters(model, checkpoint_dir=config.checkpoint_dir)
print(S.shape, np.min(S), np.max(S))
model.infer(dg.to_variable(np.expand_dims(S, axis=0)))
{'valid_size': 16, 'segment_length': 16000, 'sample_rate': 22050, 'fft_window_shift': 256, 'fft_window_size': 1024, 'fft_size': 1024, 'mel_bands': 80, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'seed': 1234, 'learning_rate': 0.0002, 'batch_size': 8, 'test_every': 2000, 'save_every': 10000, 'max_iterations': 3000000, 'sigma': 1.0, 'n_flows': 8, 'n_group': 16, 'n_layers': 8, 'n_channels': 64, 'kernel_h': 3, 'kernel_w': 3, 'checkpoint': None, 'checkpoint_dir': '/content/downloads/waveflow_res128_ljspeech_ckpt_1.0', 'iteration': None, 'name': '', 'output': './syn_audios', 'sample': 0, 'use_fp16': True, 'use_gpu': True}
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in assign only support float16 in GPU now. (When the type of input in assign is Variable.)
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now.
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in squeeze only support float16 in GPU now.
(input_name, op_name, extra_message))
(80, 679) 5.492911396874237e-11 366.29962746001155
[1, 80, 679]
(1, 173552) audio.shape
I found that there is a get_mel
function in the code examples which does some normalization. I applied it here. But I'm still getting a noise audio file. Please help.
!export CUDA_VISIBLE_DEVICES=0
from parakeet.models.waveflow import waveflow_modules
from parakeet.utils import io
import paddle.fluid.dygraph as dg
from paddle import fluid
from scipy.io.wavfile import write
from ruamel import yaml
import random, librosa
class WaveFlow(waveflow_modules.WaveFlowModule):
def __init__(self, config):
super().__init__(config)
@dg.no_grad
def infer(self, mel):
self.eval()
print(mel.shape, 'mel.shape')
audio = self.synthesize(mel)
# Denormalize audio from [-1, 1] to [-32768, 32768] int16 range.
audio = audio.numpy().astype("float32") * 32768.0
audio = audio.astype('int16')
filename = 'test.wav'
print(audio.shape, 'audio.shape')
write(filename, config.sample_rate, audio[0])
class Config():
def __init__(self, **entries):
self.__dict__.update(entries)
def get_mel(audio):
spectrogram = librosa.core.stft(
audio,
n_fft=config.fft_size,
hop_length=config.fft_window_shift,
win_length=config.fft_window_size)
spectrogram_magnitude = np.abs(spectrogram)
# mel_filter_bank shape: [n_mels, 1 + n_fft/2]
mel_filter_bank = librosa.filters.mel(sr=config.sample_rate,
n_fft=config.fft_size,
n_mels=config.mel_bands,
fmin=config.mel_fmin,
fmax=config.mel_fmax)
# mel shape: [n_mels, num_frames]
mel = np.dot(mel_filter_bank, spectrogram_magnitude)
# Normalize mel.
clip_val = 1e-5
ref_constant = 1
mel = np.log(np.clip(mel, a_min=clip_val, a_max=None) * ref_constant)
return mel
pth = "/content/Parakeet/examples/waveflow/configs/waveflow_ljspeech.yaml"
with open(pth) as f:
config = yaml.load(f, Loader=yaml.Loader)
config['checkpoint'] = None
config['checkpoint_dir'] = "/content/downloads/waveflow_res128_ljspeech_ckpt_1.0"
config['iteration'] = None
config['name'] = ''
config['output'] = './syn_audios'
config['sample'] = 0
config['use_fp16'] = True
config['use_gpu'] = True
config = Config(**config)
print(config.__dict__)
place = fluid.CUDAPlace(0) if config.use_gpu else fluid.CPUPlace()
with dg.guard(place):
# Fix random seed.
seed = config.seed
random.seed(seed)
np.random.seed(seed)
fluid.default_startup_program().random_seed = seed
fluid.default_main_program().random_seed = seed
model = WaveFlow(config)
# Dry run once to create and initalize all necessary parameters.
dtype = "float16" if config.use_fp16 else "float32"
audio = dg.to_variable(np.random.randn(1, 16000).astype(dtype))
mel = dg.to_variable(
np.random.randn(1, config.mel_bands, 63).astype(dtype))
model(audio, mel)
iteration = io.load_parameters(model, checkpoint_dir=config.checkpoint_dir)
audio, sr = librosa.load(audio_pth)
mel = dg.to_variable(np.expand_dims(get_mel(audio), axis=0))
model.infer(mel)
{'valid_size': 16, 'segment_length': 16000, 'sample_rate': 22050, 'fft_window_shift': 256, 'fft_window_size': 1024, 'fft_size': 1024, 'mel_bands': 80, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'seed': 1234, 'learning_rate': 0.0002, 'batch_size': 8, 'test_every': 2000, 'save_every': 10000, 'max_iterations': 3000000, 'sigma': 1.0, 'n_flows': 8, 'n_group': 16, 'n_layers': 8, 'n_channels': 64, 'kernel_h': 3, 'kernel_w': 3, 'checkpoint': None, 'checkpoint_dir': '/content/downloads/waveflow_res128_ljspeech_ckpt_1.0', 'iteration': None, 'name': '', 'output': './syn_audios', 'sample': 0, 'use_fp16': True, 'use_gpu': True}
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in assign only support float16 in GPU now. (When the type of input in assign is Variable.)
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now.
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in squeeze only support float16 in GPU now.
(input_name, op_name, extra_message))
[1, 80, 684] mel.shape
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now.
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in squeeze only support float16 in GPU now.
(input_name, op_name, extra_message))
(1, 174832) audio.shape
Do you have any idea why this is happening?
I replaced my test audio file with the audio from ljspeach dataset. It doesn't change anything, I'm still getting noise. To reproduce run this code in Colab:
!wget -qqq https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res128_ljspeech_ckpt_1.0.zip > /dev/null
!mkdir -p /content/downloads
!unzip -qqq /content/waveflow_res128_ljspeech_ckpt_1.0.zip -d /content/downloads > /dev/null
!rm -rf /content/sample_data /content/waveflow_res128_ljspeech_ckpt_1.0.zip
!sudo apt-get update -y -qqq --fix-missing && apt-get install -y -qqq libsndfile1 > /dev/null
!pip install -U -qqq imgaug scipy albumentations paddlepaddle-gpu > /dev/null
!git clone -qqq https://github.com/PaddlePaddle/Parakeet > /dev/null
%cd /content/Parakeet
!pip install -qqq -e . > /dev/null
import os, pathlib, nltk, sys
import numpy as np
import matplotlib.pyplot as plt
nltk.download("punkt")
nltk.download("cmudict")
pth_1 = "/content/Parakeet"
if pth_1 not in sys.path: sys.path.insert(0, pth_1)
pth_2 = "/content/Parakeet/examples/waveflow"
if pth_2 not in sys.path: sys.path.insert(0, pth_2)
%cd /content/downloads
!curl -Ls https://dl.dropboxusercontent.com/s/jj78665lrhiod97/LJ001-0001.wav -o LJ001-0001.wav
audio_pth = "/content/downloads/LJ001-0001.wav"
!export CUDA_VISIBLE_DEVICES=0
from parakeet.models.waveflow import waveflow_modules
from parakeet.modules import weight_norm
from parakeet.utils import io
import paddle.fluid.dygraph as dg
from paddle import fluid
from scipy.io.wavfile import read
from scipy.io.wavfile import write
from ruamel import yaml
import random, librosa
class Config():
def __init__(self, **entries):
self.__dict__.update(entries)
class WaveFlow():
def __init__(self,
config,
parallel=False,
rank=0,
nranks=1,
tb_logger=None):
self.config = config
self.checkpoint_dir = config.checkpoint_dir
self.parallel = parallel
self.rank = rank
self.nranks = nranks
self.tb_logger = tb_logger
self.dtype = "float16" if config.use_fp16 else "float32"
def build(self):
config = self.config
waveflow = waveflow_modules.WaveFlowModule(config)
# Dry run once to create and initalize all necessary parameters.
audio = dg.to_variable(np.random.randn(1, 16000).astype(self.dtype))
mel = dg.to_variable(
np.random.randn(1, config.mel_bands, 63).astype(self.dtype))
waveflow(audio, mel)
iteration = io.load_parameters(waveflow, checkpoint_dir=self.checkpoint_dir)
for layer in waveflow.sublayers():
if isinstance(layer, weight_norm.WeightNormWrapper):
layer.remove_weight_norm()
self.waveflow = waveflow
return iteration
@dg.no_grad
def infer(self, mel):
self.waveflow.eval()
config = self.config
print(mel.shape, 'mel.shape')
audio = self.waveflow.synthesize(mel, sigma=self.config.sigma)
audio = audio[0]
# Denormalize audio from [-1, 1] to [-32768, 32768] int16 range.
audio = audio.numpy().astype("float32") * 32768.0
audio = audio.astype('int16')
filename = 'test.wav'
print(audio.shape, 'audio.shape')
write(filename, config.sample_rate, audio)
def get_mel(audio):
spectrogram = librosa.core.stft(
audio,
n_fft=config.fft_size,
hop_length=config.fft_window_shift,
win_length=config.fft_window_size)
spectrogram_magnitude = np.abs(spectrogram)
# mel_filter_bank shape: [n_mels, 1 + n_fft/2]
mel_filter_bank = librosa.filters.mel(sr=config.sample_rate,
n_fft=config.fft_size,
n_mels=config.mel_bands,
fmin=config.mel_fmin,
fmax=config.mel_fmax)
# mel shape: [n_mels, num_frames]
mel = np.dot(mel_filter_bank, spectrogram_magnitude)
# Normalize mel.
clip_val = 1e-5
ref_constant = 1
mel = np.log(np.clip(mel, a_min=clip_val, a_max=None) * ref_constant)
return mel
def get_config(pth="/content/Parakeet/examples/waveflow/configs/waveflow_ljspeech.yaml"):
with open(pth) as f:
config = yaml.load(f, Loader=yaml.Loader)
config['checkpoint'] = None
config['checkpoint_dir'] = "/content/downloads/waveflow_res128_ljspeech_ckpt_1.0"
config['iteration'] = None
config['name'] = ''
config['output'] = './syn_audios'
config['sample'] = 0
config['use_fp16'] = True
config['use_gpu'] = True
return Config(**config)
config = get_config()
print(config.__dict__)
place = fluid.CUDAPlace(0) if config.use_gpu else fluid.CPUPlace()
with dg.guard(place):
# Fix random seed.
seed = config.seed
random.seed(seed)
np.random.seed(seed)
fluid.default_startup_program().random_seed = seed
fluid.default_main_program().random_seed = seed
# Build model.
model = WaveFlow(config)
iteration = model.build()
print(iteration, "iteration")
# Obtain the current iteration.
if config.checkpoint is None:
if config.iteration is None:
print("_load_latest_checkpoint")
iteration = io._load_latest_checkpoint(config.checkpoint_dir)
else:
iteration = config.iteration
else:
iteration = int(config.checkpoint.split('/')[-1].split('-')[-1])
print(config.checkpoint_dir, iteration)
loaded_sr, audio = read(audio_pth)
mel = dg.to_variable(np.expand_dims(get_mel(np.asarray(audio, dtype=np.float32)), axis=0))
model.infer(mel)
{'valid_size': 16, 'segment_length': 16000, 'sample_rate': 22050, 'fft_window_shift': 256, 'fft_window_size': 1024, 'fft_size': 1024, 'mel_bands': 80, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'seed': 1234, 'learning_rate': 0.0002, 'batch_size': 8, 'test_every': 2000, 'save_every': 10000, 'max_iterations': 3000000, 'sigma': 1.0, 'n_flows': 8, 'n_group': 16, 'n_layers': 8, 'n_channels': 64, 'kernel_h': 3, 'kernel_w': 3, 'checkpoint': None, 'checkpoint_dir': '/content/downloads/waveflow_res128_ljspeech_ckpt_1.0', 'iteration': None, 'name': '', 'output': './syn_audios', 'sample': 0, 'use_fp16': True, 'use_gpu': True}
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in assign only support float16 in GPU now. (When the type of input in assign is Variable.)
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now.
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in squeeze only support float16 in GPU now.
(input_name, op_name, extra_message))
0 iteration
_load_latest_checkpoint
/content/downloads/waveflow_res128_ljspeech_ckpt_1.0 0
[1, 80, 832] mel.shape
(212720,) audio.shape
I found a bug in your official code example:
In synthesis.py -> synthesize
replace:
iteration = io.load_latest_checkpoint(checkpoint_dir)
with
iteration = io._load_latest_checkpoint(checkpoint_dir)
I also tried your official command line command:
I replaced checkpoint_dir
in synthesis.py -> synthesize
to my downloaded checkpoint path:
checkpoint_dir = "/content/downloads/waveflow_res128_ljspeech_ckpt_1.0"
and ran your official command line command:
!export CUDA_VISIBLE_DEVICES=0
!python -u synthesis.py \
--config=./configs/waveflow_ljspeech.yaml \
--root=./data/LJSpeech-1.1 \
--name=ModelName --use_gpu=true \
--output=./syn_audios \
--sample=0 \
--sigma=1.0
{'batch_size': 8,
'checkpoint': None,
'config': './configs/waveflow_ljspeech.yaml',
'fft_size': 1024,
'fft_window_shift': 256,
'fft_window_size': 1024,
'iteration': None,
'kernel_h': 3,
'kernel_w': 3,
'learning_rate': 0.0002,
'max_iterations': 3000000,
'mel_bands': 80,
'mel_fmax': 8000.0,
'mel_fmin': 0.0,
'model': 'waveflow',
'n_channels': 64,
'n_flows': 8,
'n_group': 16,
'n_layers': 8,
'name': 'ModelName',
'output': './syn_audios',
'root': './data/LJSpeech-1.1',
'sample': 0,
'sample_rate': 22050,
'save_every': 10000,
'seed': 1234,
'segment_length': 16000,
'sigma': 1.0,
'test_every': 2000,
'use_fp16': True,
'use_gpu': True,
'valid_size': 16}
Random Seed: 1234
W0411 09:45:54.968588 1820 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.1, Runtime API Version: 10.0
W0411 09:45:54.972355 1820 device_context.cc:245] device: 0, cuDNN Version: 7.6.
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in assign only support float16 in GPU now. (When the type of input in assign is Variable.)
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'x' in cast only support float16 in GPU now.
(input_name, op_name, extra_message))
/usr/local/lib/python3.6/dist-packages/paddle/fluid/data_feeder.py:93: UserWarning: The data type of 'input' in squeeze only support float16 in GPU now.
(input_name, op_name, extra_message))
Rank 0: checkpoint loaded.
Synthesize sample 0, save as ./syn_audios/ModelName/iter-0/valid_0.wav
audio time 9.6472, synthesis time 1.2663
The result was an audio file with just noise.
I also tried your smaller model: https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_res64_ljspeech_ckpt_1.0.zip The result was the same noise audio file.
I hope that @kuke can comment for a possible solution when he has time.
@qo4on Sorry for the delayed reply.
The bug you mentioned has been fixed already, please update the code. And try to run the synthesis in this way
python -u synthesis.py \
--root=LJSpeech-1.1 \
--name=${ModelName} --use_gpu=true \
--output=./syn_audios \
--sigma=1.0 \
--use_fp16=true \
--config=waveflow_res128_ljspeech_ckpt_1.0/waveflow_ljspeech.yaml \
--checkpoint=waveflow_res128_ljspeech_ckpt_1.0/step-2000000 \
There are some guidelines in the README, but I'm not sure they are clear enough.
@kuke Thank you very much!
Please refer to the following doc for how to deploy inference model: https://www.paddlepaddle.org.cn/documentation/docs/en/advanced_guide/inference_deployment/index_en.so
so if there are some simple examples for c++/python demo to show how to load a paddle detection model and use it as inference? i checked inference section, but onle a model compression intro is provided...
Hi! Parakeet looks very promising, but I can't find a working example to use. I'm trying to run inference for WaveFlow model. How can I load this model and fed a mel spectrogram to it?