YuanGongND / gopt

Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".
BSD 3-Clause "New" or "Revised" License
153 stars 28 forks source link

The result of u1,u2,u3,u4,u5 is always higher than 1.5 #25

Open JustVinh opened 1 year ago

JustVinh commented 1 year ago

Hi all, I used the output https://www.dropbox.com/s/zc6o1d8rqq28vci/data.zip?dl=1 and follow the inference instruction at steps_of_inference.md. But after running inference with te_label_phn.npy and te_feat.npy of the librispeech output I downloaded before, the result is weird. I checked the min value of each tensor from u1 to u5, but their mean values are always higher than 1.5 which is very high and unexpected because there are bad pronunciation examples inside speechocean762 dataset.

For example, min value of u1 is 1.5474397 and max value of u1 is 1.7969123.

I used this inference code:

`import torch import sys import os sys.path.append(os.path.abspath('../src/')) from models import GOPT gopt = GOPT(embed_dim=24, num_heads=1, depth=3, input_dim=84) gopt = torch.nn.DataParallel(gopt) sd = torch.load('gopt_librispeech/best_audio_model.pth', map_location='cpu') gopt.load_state_dict(sd, strict=True)

import numpy as np input_feat = np.load("te_feat.npy") input_phn = np.load("te_label_phn.npy") gopt = gopt.float() gopt.eval() with torch.no_grad(): t_input_feat = torch.from_numpy(input_feat[:,:,:]) t_phn = torch.from_numpy(input_phn[:,:,0]) u1, u2, u3, u4, u5, p, w1, w2, w3 = gopt(t_input_feat.float(),t_phn.float())`

Rtut654 commented 1 year ago

Hey! Have you solved the issue?

YangangCao commented 1 year ago

Hi, do you try some sentences with totally wrong pronunciation?