Open YangangCao opened 1 year ago
And I test a short sentence, negative number appear in w1, it's strange, how do you think about it ?
w1: tensor([[[ 1.8711e-01],
[ 1.8163e-02],
[ 6.9934e-02],
[-7.8350e-02],
[-4.9876e-02],
[-1.9600e-02],
[-2.8484e-02],
[ 2.5403e-01],
[ 1.2672e-01],
[-3.9872e-02],
[-1.2691e-01],
[ 3.4448e-02],
[-1.4095e-01],
[-8.9084e-02],
[-1.7390e-01],
[-3.6188e-02],
[-1.6663e-01],
[ 7.3178e-02],
[-9.7041e-02],
[-2.1955e-02],
[-1.1918e-01],
[ 3.8830e-02],
[-6.7791e-02],
[ 8.4965e-02],
[-6.8574e-02],
[-2.7004e-02],
[-9.1636e-02],
[-5.1762e-02],
[ 2.8749e-02],
[ 1.3503e-01],
[ 2.5275e-03],
[-1.7253e-01],
[-5.4970e-02],
[-1.0083e-01],
[-1.3783e-01],
[-2.8183e-01],
[-6.7833e-02],
[-1.1899e-01],
[-7.5371e-02],
[-1.7042e-01],
[ 8.6797e-03],
[ 5.7114e-02],
[ 6.5258e-04],
[ 3.3913e-02],
[-1.0919e-01],
[-5.4881e-02],
[ 7.9677e-01],
[ 7.5072e-01],
[ 8.4159e-01],
[ 7.8338e-01]]], device='cuda:0')
the other output are always bigger than 0
hi there,
Hi, dear author, when I infer my own data, the length exceed 50 and error occurs. I find this line in gopt.py, self.pos_embed = nn.Parameter(torch.zeros(1, 55, self.embed_dim)) So the max length is 50? If the sentence exceed 50, I must clip it into shorter fragment?
Yes, you are correct. This is because the so762 dataset's max length is 50. Unfortunately, the default positional embedding is learnable, so it is not easy for extrapolation. However, if you use the sin positional embedding and retrain the model (would be easy, we have the colab script to do so). You can extrapolate the positional embedding and inference with longer sequences.
And I test a short sentence, negative number appear in w1, it's strange, how do you think about it ?
This is a more complex question, but the first thing is to make sure your extracted GOP feature is correct. I recommend checking if the SO762 test set score has negative values, if not, then it is likely it is the GOP feature input's problem. Then you can check the statistics of your GOP feature (e.g., max, mean, std) compared with our SO762 input feature.
-Yuan
Thanks for your kind and fast reply! I will follow your suggestions.
Hi, I tried the speechocean762 data, and also appear negative value in w1. I have attached them, maybe you can try train.zip
I use librispeech model w1: tensor([[[ 0.1317], [ 0.1545], [ 0.0895], [ 0.0449], [ 0.0803], [ 0.0930], [ 0.2981], [ 0.1362], [ 0.1225], [ 0.0795], [ 0.1348], [-0.0443], [ 0.0054], [ 0.0733], [ 0.4038], [ 1.1466], [ 1.1085], [ 1.1391], [ 1.0634], [ 1.2589], [ 1.1451], [ 1.0485], [ 1.1034], [ 1.1758], [ 1.1658], [ 1.0924], [ 1.1657], [ 1.1954], [ 1.1132], [ 1.1714], [ 0.9821], [ 1.1045], [ 1.0680], [ 0.9793], [ 1.1042], [ 1.1401], [ 1.0787], [ 1.0832], [ 1.1980], [ 1.1647], [ 1.1390], [ 1.1669], [ 1.1677], [ 1.1477], [ 1.1583], [ 1.1625], [ 1.1313], [ 1.0742], [ 1.1767], [ 1.1211]]], device='cuda:0')
Hi, dear author, when I infer my own data, the length exceed 50 and error occurs. I find this line in gopt.py,
self.pos_embed = nn.Parameter(torch.zeros(1, 55, self.embed_dim))
So the max length is 50? If the sentence exceed 50, I must clip it into shorter fragment? @YuanGongND