The result of u1,u2,u3,u4,u5 is always higher than 1.5

JustVinh commented 1 year ago

Hi all, I used the output https://www.dropbox.com/s/zc6o1d8rqq28vci/data.zip?dl=1 and follow the inference instruction at steps_of_inference.md. But after running inference with te_label_phn.npy and te_feat.npy of the librispeech output I downloaded before, the result is weird. I checked the min value of each tensor from u1 to u5, but their mean values are always higher than 1.5 which is very high and unexpected because there are bad pronunciation examples inside speechocean762 dataset.

For example, min value of u1 is 1.5474397 and max value of u1 is 1.7969123.

I used this inference code:

`import torch import sys import os sys.path.append(os.path.abspath('../src/')) from models import GOPT gopt = GOPT(embed_dim=24, num_heads=1, depth=3, input_dim=84) gopt = torch.nn.DataParallel(gopt) sd = torch.load('gopt_librispeech/best_audio_model.pth', map_location='cpu') gopt.load_state_dict(sd, strict=True)

import numpy as np input_feat = np.load("te_feat.npy") input_phn = np.load("te_label_phn.npy") gopt = gopt.float() gopt.eval() with torch.no_grad(): t_input_feat = torch.from_numpy(input_feat[:,:,:]) t_phn = torch.from_numpy(input_phn[:,:,0]) u1, u2, u3, u4, u5, p, w1, w2, w3 = gopt(t_input_feat.float(),t_phn.float())`

Rtut654 commented 1 year ago

Hey! Have you solved the issue?

YangangCao commented 1 year ago

Hi, do you try some sentences with totally wrong pronunciation？

YuanGongND / gopt

The result of u1,u2,u3,u4,u5 is always higher than 1.5 #25