Hi all, I used the output https://www.dropbox.com/s/zc6o1d8rqq28vci/data.zip?dl=1 and follow the inference instruction at steps_of_inference.md. But after running inference with te_label_phn.npy and te_feat.npy of the librispeech output I downloaded before, the result is weird. I checked the min value of each tensor from u1 to u5, but their mean values are always higher than 1.5 which is very high and unexpected because there are bad pronunciation examples inside speechocean762 dataset.
For example, min value of u1 is 1.5474397 and max value of u1 is 1.7969123.
Hi all, I used the output https://www.dropbox.com/s/zc6o1d8rqq28vci/data.zip?dl=1 and follow the inference instruction at steps_of_inference.md. But after running inference with te_label_phn.npy and te_feat.npy of the librispeech output I downloaded before, the result is weird. I checked the min value of each tensor from u1 to u5, but their mean values are always higher than 1.5 which is very high and unexpected because there are bad pronunciation examples inside speechocean762 dataset.
For example, min value of u1 is 1.5474397 and max value of u1 is 1.7969123.
I used this inference code:
`import torch import sys import os sys.path.append(os.path.abspath('../src/')) from models import GOPT gopt = GOPT(embed_dim=24, num_heads=1, depth=3, input_dim=84) gopt = torch.nn.DataParallel(gopt) sd = torch.load('gopt_librispeech/best_audio_model.pth', map_location='cpu') gopt.load_state_dict(sd, strict=True)
import numpy as np input_feat = np.load("te_feat.npy") input_phn = np.load("te_label_phn.npy") gopt = gopt.float() gopt.eval() with torch.no_grad(): t_input_feat = torch.from_numpy(input_feat[:,:,:]) t_phn = torch.from_numpy(input_phn[:,:,0]) u1, u2, u3, u4, u5, p, w1, w2, w3 = gopt(t_input_feat.float(),t_phn.float())`