YuanGongND / gopt

Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".
BSD 3-Clause "New" or "Revised" License
153 stars 28 forks source link

my own data length limit #29

Open YangangCao opened 1 year ago

YangangCao commented 1 year ago

Hi, dear author, when I infer my own data, the length exceed 50 and error occurs. I find this line in gopt.py, self.pos_embed = nn.Parameter(torch.zeros(1, 55, self.embed_dim)) So the max length is 50? If the sentence exceed 50, I must clip it into shorter fragment? @YuanGongND

YangangCao commented 1 year ago

And I test a short sentence, negative number appear in w1, it's strange, how do you think about it ?

w1: tensor([[[ 1.8711e-01],
         [ 1.8163e-02],
         [ 6.9934e-02],
         [-7.8350e-02],
         [-4.9876e-02],
         [-1.9600e-02],
         [-2.8484e-02],
         [ 2.5403e-01],
         [ 1.2672e-01],
         [-3.9872e-02],
         [-1.2691e-01],
         [ 3.4448e-02],
         [-1.4095e-01],
         [-8.9084e-02],
         [-1.7390e-01],
         [-3.6188e-02],
         [-1.6663e-01],
         [ 7.3178e-02],
         [-9.7041e-02],
         [-2.1955e-02],
         [-1.1918e-01],
         [ 3.8830e-02],
         [-6.7791e-02],
         [ 8.4965e-02],
         [-6.8574e-02],
         [-2.7004e-02],
         [-9.1636e-02],
         [-5.1762e-02],
         [ 2.8749e-02],
         [ 1.3503e-01],
         [ 2.5275e-03],
         [-1.7253e-01],
         [-5.4970e-02],
         [-1.0083e-01],
         [-1.3783e-01],
         [-2.8183e-01],
         [-6.7833e-02],
         [-1.1899e-01],
         [-7.5371e-02],
         [-1.7042e-01],
         [ 8.6797e-03],
         [ 5.7114e-02],
         [ 6.5258e-04],
         [ 3.3913e-02],
         [-1.0919e-01],
         [-5.4881e-02],
         [ 7.9677e-01],
         [ 7.5072e-01],
         [ 8.4159e-01],
         [ 7.8338e-01]]], device='cuda:0')

the other output are always bigger than 0

YuanGongND commented 1 year ago

hi there,

Hi, dear author, when I infer my own data, the length exceed 50 and error occurs. I find this line in gopt.py, self.pos_embed = nn.Parameter(torch.zeros(1, 55, self.embed_dim)) So the max length is 50? If the sentence exceed 50, I must clip it into shorter fragment?

Yes, you are correct. This is because the so762 dataset's max length is 50. Unfortunately, the default positional embedding is learnable, so it is not easy for extrapolation. However, if you use the sin positional embedding and retrain the model (would be easy, we have the colab script to do so). You can extrapolate the positional embedding and inference with longer sequences.

https://github.com/YuanGongND/gopt/blob/bed909daf8eca035095871e51642525acc5b9b55/src/models/gopt.py#L144-L145

And I test a short sentence, negative number appear in w1, it's strange, how do you think about it ?

This is a more complex question, but the first thing is to make sure your extracted GOP feature is correct. I recommend checking if the SO762 test set score has negative values, if not, then it is likely it is the GOP feature input's problem. Then you can check the statistics of your GOP feature (e.g., max, mean, std) compared with our SO762 input feature.

-Yuan

YangangCao commented 1 year ago

Thanks for your kind and fast reply! I will follow your suggestions.

YangangCao commented 1 year ago

Hi, I tried the speechocean762 data, and also appear negative value in w1. I have attached them, maybe you can try train.zip

I use librispeech model w1: tensor([[[ 0.1317], [ 0.1545], [ 0.0895], [ 0.0449], [ 0.0803], [ 0.0930], [ 0.2981], [ 0.1362], [ 0.1225], [ 0.0795], [ 0.1348], [-0.0443], [ 0.0054], [ 0.0733], [ 0.4038], [ 1.1466], [ 1.1085], [ 1.1391], [ 1.0634], [ 1.2589], [ 1.1451], [ 1.0485], [ 1.1034], [ 1.1758], [ 1.1658], [ 1.0924], [ 1.1657], [ 1.1954], [ 1.1132], [ 1.1714], [ 0.9821], [ 1.1045], [ 1.0680], [ 0.9793], [ 1.1042], [ 1.1401], [ 1.0787], [ 1.0832], [ 1.1980], [ 1.1647], [ 1.1390], [ 1.1669], [ 1.1677], [ 1.1477], [ 1.1583], [ 1.1625], [ 1.1313], [ 1.0742], [ 1.1767], [ 1.1211]]], device='cuda:0')