Closed ysn7 closed 3 years ago
Hi,
I am afraid that I do not know how to address this issue.
The information provided is very limited, and I do not have any experience dealing with Chinese dataset. I will just leave this issue open. Maybe other researchers can help.
刚解决了差不多的问题在中文数据集上,确保数据那几个字段正确就恢复正常了。(我是start/end字段出错)
刚解决了差不多的问题在中文数据集上,确保数据那几个字段正确就恢复正常了。(我是start/end字段出错)
感谢 请问你的数据集是在start end 位置出错么,下标应该是从0开始吧?请问您的数据集大概有多少条?我目前只用了大约200条测试,发现在池化步骤,h_out异常,subj_out和obj_out看着正常。是不是我的数据集太少了?我可以去掉池化步骤么?
刚解决了差不多的问题在中文数据集上,确保数据那几个字段正确就恢复正常了。(我是start/end字段出错)
感谢 请问你的数据集是在start end 位置出错么,下标应该是从0开始吧?请问您的数据集大概有多少条?我目前只用了大约200条测试,发现在池化步骤,h_out异常,subj_out和obj_out看着正常。是不是我的数据集太少了?我可以去掉池化步骤么?
h :
tensor([[[ 6.7287e-03, -3.4431e-02, -2.4689e-02, ..., 2.0638e-02,
1.0605e-03, -3.3505e-04],
[-4.3808e-03, -6.2138e-02, -1.8356e-02, ..., 2.5058e-02,
-4.5469e-02, 7.7263e-04],
[-6.0311e-03, -5.2898e-02, 9.4507e-03, ..., -2.5679e-02,
-8.1830e-03, 1.3188e-02],
...,
[-2.6716e-03, -4.7641e-02, -9.9292e-03, ..., 3.2689e-03,
-3.1476e-02, -1.7903e-02],
[-3.8555e-03, -3.9092e-02, -1.4769e-02, ..., 3.7055e-02,
-1.8394e-02, -2.3195e-03],
[ 2.4301e-03, -3.3044e-02, -1.6335e-02, ..., -6.8358e-03,
-2.6581e-02, -6.9456e-03]],
[[-2.3578e-03, -2.9572e-02, 4.2690e-04, ..., 1.9457e-02,
-3.5350e-02, 6.6806e-03],
[-2.8738e-02, -4.0784e-02, -2.0985e-02, ..., 4.5031e-03,
-3.6514e-02, 2.6311e-02],
[-1.5663e-02, -4.9511e-02, -2.1087e-02, ..., -3.7576e-03,
-1.7783e-02, -1.4890e-02],
...,
[-6.3035e-03, -3.3592e-02, 1.5824e-03, ..., -4.5093e-03,
-2.3779e-02, 1.7091e-02],
[-3.2743e-04, -6.0516e-02, -1.4560e-02, ..., -1.3474e-02,
-3.0212e-02, 1.0995e-02],
[-1.3955e-02, -5.1576e-02, -3.2273e-02, ..., 7.2469e-03,
-1.0113e-02, -9.2596e-03]],
[[ 1.5395e-03, -3.2788e-02, -3.8540e-02, ..., 7.6097e-03,
-1.0825e-02, 2.7164e-02],
[-1.2386e-02, -5.6025e-02, 2.6489e-03, ..., 3.1476e-02,
-2.9831e-02, -3.9439e-03],
[ 5.0491e-03, -4.1129e-02, -1.3984e-02, ..., 1.6405e-02,
-9.6303e-03, -4.6909e-03],
...,
[-3.6644e-03, -5.7056e-02, -9.4472e-03, ..., -1.0197e-03,
-4.0458e-03, -4.6976e-04],
[-1.2828e-02, -6.3767e-02, -4.6125e-03, ..., -2.0876e-04,
-1.9317e-02, 2.6939e-02],
[-9.9685e-03, -4.6527e-02, -1.9236e-02, ..., -5.3059e-03,。。。
h_池化后:
h_out
tensor([[-1.0000e+12, -1.0000e+12, -1.0000e+12, ..., -1.0000e+12,
-1.0000e+12, -1.0000e+12],
[-1.0000e+12, -1.0000e+12, -1.0000e+12, ..., -1.0000e+12,
-1.0000e+12, -1.0000e+12],
[-1.0000e+12, -1.0000e+12, -1.0000e+12, ..., -1.0000e+12,
-1.0000e+12, -1.0000e+12],
...,
[-3.7423e-03, -5.1638e-02, -1.3509e-02, ..., 5.8004e-03,
1.0368e-02, 1.8677e-02],
[-1.0000e+12, -1.0000e+12, -1.0000e+12, ..., -1.0000e+12,
-1.0000e+12, -1.0000e+12],
[-1.0000e+12, -1.0000e+12, -1.0000e+12, ..., -1.0000e+12,
-1.0000e+12, -1.0000e+12]], device='cuda:0', grad_fn=
subj_out
tensor([[-4.6118e-02, -8.3082e-02, -4.0409e-02, ..., 2.0847e-02,
-9.8066e-03, -4.6178e-02],
[-2.2601e-01, -1.2420e+00, -5.4379e-01, ..., 1.7724e-02,
-4.2563e-01, 2.4472e-01],
[-2.2986e-01, -1.5995e+00, -5.1769e-01, ..., -1.8251e-03,
-5.1588e-01, 3.5035e-01],
...,
[-5.0190e-01, -3.5716e+00, -1.3304e+00, ..., 1.4205e-01,
-1.1360e+00, 8.0728e-01],
[-3.3840e-01, -4.0206e+00, -1.5507e+00, ..., 3.2900e-02,
-1.1667e+00, 7.5493e-01],
[-3.7166e-01, -3.9341e+00, -1.5383e+00, ..., 9.0344e-02,
-1.1905e+00, 9.8369e-01]], device='cuda:0', grad_fn=
非常感谢,已经解决。
非常感谢,已经解决。
请问你是怎么解决的?
是不是我的数据集太少了
请问怎么解决的哇,我也是很大的loss
@ysn7
@timoderbeste 请问您解决了吗
hello, Sorry for disturbing you,When I replaced the data set with a Chinese data set, the loss became extremely huge. What is the reason? Finetune all embeddings. epoch 1: train_loss = 768000105463386346618880.000000 model saved to ./saved_models/01/checkpoint_epoch_2.pt
epoch 2: train_loss = 768000105463386346618880.000000 model saved to ./saved_models/01/checkpoint_epoch_3.pt
epoch 3: train_loss = 768000105463386346618880.000000 model saved to ./saved_models/01/checkpoint_epoch_4.pt
epoch 4: train_loss = 768000105463386346618880.000000 model saved to ./saved_models/01/checkpoint_epoch_5.pt
epoch 5: train_loss = 768000105463386346618880.000000 model saved to ./saved_models/01/checkpoint_epoch_6.pt
epoch 6: train_loss = 768000105463386346618880.000000 model saved to ./saved_models/01/checkpoint_epoch_7.pt
2020-12-23 08:47:06.380100: step 20/450 (epoch 7/150), loss = 576000088104739014705152.000000 (0.161 sec/batch), lr: 0.010000