abduallahmohamed / Social-STGCNN

Code for "Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction" CVPR 2020
MIT License
483 stars 141 forks source link

ValueError: The parameter covariance_matrix has invalid values #54

Closed QiyuLuo closed 2 years ago

QiyuLuo commented 2 years ago
        sx = torch.exp(V_pred[:,:,2]) #sx
        sy = torch.exp(V_pred[:,:,3]) #sy
        corr = torch.tanh(V_pred[:,:,4]) #corr

        cov = torch.zeros(V_pred.shape[0],V_pred.shape[1],2,2).cuda()
        cov[:,:,0,0]= sx*sx
        cov[:,:,0,1]= corr*sx*sy
        cov[:,:,1,0]= corr*sx*sy
        cov[:,:,1,1]= sy*sy
        mean = V_pred[:,:,0:2]

        mvnormal = torchdist.MultivariateNormal(mean,cov)

What is the meaning of this code? This error occurs when I run the code on other data sets. Print the value and find the value of cov changed to INF. Exp must be used. torch.exp operation can be replaced with other ones. I sincerely hope you can give me some advice. Thank your for your help.

(Pdb) p cov[:, :, 0, 0]
tensor([[       inf,        inf,        inf,  ...,        inf,        inf,
         5.8854e-08],
        [5.2052e-12, 3.6343e+09, 5.3083e+10,  ..., 6.0116e+13, 4.4345e+26,
         1.6418e-18],
        [0.0000e+00, 1.7857e-13, 2.5362e-12,  ..., 1.8676e-09, 8.0149e+06,
         1.5469e-22],
        ...,
        [0.0000e+00, 2.1161e-21, 7.0877e-18,  ..., 6.8720e-15, 1.2533e+02,
         4.3264e-16],
        [0.0000e+00, 3.2086e-06, 4.9885e-03,  ..., 1.1735e+01, 1.8858e+19,
         4.8166e-15],
        [0.0000e+00, 2.3180e+02, 1.7170e+06,  ..., 1.6433e+09, 1.9954e+24,
         2.6417e-13]], device='cuda:0', grad_fn=<SelectBackward>)

@abduallahmohamed

abduallahmohamed commented 2 years ago

Your values are too high for a cov matrix, I see some values (e+13). I would lower the lr, limit gradients and scale my target. Having inf means the network exploded.

daeunni commented 2 years ago

Why you use tanh function when calculate corr?

negRho = 1 - corr**2

I have problem in this code when corr = 1 . How can I solve this?

abduallahmohamed commented 2 years ago

Because correlation is between-1 and +1

On Mon, Nov 8, 2021 at 6:08 AM Daeun Lee @.***> wrote:

Why you use tanh function when calculate corr?

negRho = 1 - corr**2

I have problem in this code when corr = 1 . How can I solve this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/abduallahmohamed/Social-STGCNN/issues/54#issuecomment-963087082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIVLGYAZFXXA6JJFCSYRM3UK64SJANCNFSM5E7EO24A .

--

Abduallah Mohamed abduallahmohamed.com

daeunni commented 2 years ago

def bivariate_loss(V_pred,V_trgt): normx = V_trgt[:,:,0]- V_pred[:,:,0] normy = V_trgt[:,:,1]- V_pred[:,:,1] sx = torch.exp(V_pred[:,:,2]) #sx sy = torch.exp(V_pred[:,:,3]) #sy corr = torch.tanh(V_pred[:,:,4]) sxsy = sx sy z = (normx/sx)2 + (normy/sy)2 - 2((corrnormxnormy)/sxsy) bound_corr = torch.clamp(corr2, max=0.99999)** > mey I use this part? negRho = 1 - bound_corr

I get negRho=0 because corr =1 in this part, there is a problem that loss comes out nan due to the divided zero problem.

Therefore, I wonder if there is any problem with gradient update or learning even if I clamp with values such as 0.99999 so that corr does not come out 1 to solve this problem.

abduallahmohamed commented 2 years ago

I think correlation 1 is to high! You already at the limit of the Tanh, you should lower your lr

On Mon, Nov 8, 2021 at 6:44 AM Daeun Lee @.***> wrote:

def bivariate_loss(V_pred,V_trgt): normx = V_trgt[:,:,0]- V_pred[:,:,0] normy = V_trgt[:,:,1]- V_pred[:,:,1] sx = torch.exp(V_pred[:,:,2]) #sx sy = torch.exp(V_pred[:,:,3]) #sy corr = torch.tanh(V_pred[:,:,4]) sxsy = sx sy z = (normx/sx) 2 + (normy/sy)2 - 2((corrnormxnormy)/sxsy) bound_corr = torch.clamp(corr2, max=0.99999) > mey I use this part? negRho = 1 - bound_corr

I get negRho=0 because corr =1 in this part, there is a problem that loss comes out nan due to the divided zero problem.

Therefore, I wonder if there is any problem with gradient update or learning even if I clamp with values such as 0.99999 so that corr does not come out 1 to solve this problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/abduallahmohamed/Social-STGCNN/issues/54#issuecomment-963114148, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIVLG5IDOA5SNJ35LWS4ZLUK7AZ5ANCNFSM5E7EO24A .

--

Abduallah Mohamed abduallahmohamed.com

daeunni commented 2 years ago

Thank you for your advice. However, no matter how much lr is reduced, the situation does not improve. Can you recommend another way?

abduallahmohamed commented 2 years ago

Check the gradients flow of your model, probably there's an area where the gradients explode

Abduallah Mohamed abduallahmohamed.com

On Mon, Nov 8, 2021 at 11:32 PM Daeun Lee @.***> wrote:

Thank you for your advice. However, no matter how much lr is reduced, the situation does not improve. Can you recommend another way?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/abduallahmohamed/Social-STGCNN/issues/54#issuecomment-963831752, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIVLG4PYSDJLFXQLJ4ZB3DULCW5LANCNFSM5E7EO24A .