Khrylx / AgentFormer

[ICCV 2021] Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".
https://www.ye-yuan.com/agentformer/
MIT License
257 stars 64 forks source link

some puzzles about the math formulas in the CVAE Future Decoder part #12

Closed ultimatedigiman closed 2 years ago

ultimatedigiman commented 2 years ago

As you described in section 3.2 of the paper:

image

I can understand the purpose of the MSE term ||Y-\hat Y||^2 is to push the real value Y and the mean of the Guassian \hat Y as close as possible,because the Gaussian distribution has the maximum probability value at the mean value. But where did the weighting factor 1/(2beta) come from? Why dividing the variance by beta leads to this weighting factor?

Khrylx commented 2 years ago

This is a typo. It should be 0.5\beta before the MSE term to make it an actual weighting factor. Nice catch

ultimatedigiman commented 2 years ago

This is a typo. It should be 0.5\beta before the MSE term to make it an actual weighting factor. Nice catch

Hi, Could you please figure out why dividing the Guassian standard deviation I by beta leads to the weighting factor 1/(2beta)?

image

How can I derive the weighting factor 1/(2beta) from the Guassian standard deviation I/beta?

Khrylx commented 2 years ago

Hi,

i already said that it is a typo, it should be 0.5beta ||Y-\hat{Y}||^2.

ultimatedigiman commented 2 years ago

Hi,

i already said that it is a typo, it should be 0.5beta ||Y-\hat{Y}||^2.

Thanks for the quick reply! I'm afraid I mistaken the \beta in your last reply as dividing by beta. But I still don't understand why dividing the Guassian standard deviation I by beta leads to the weighting factor 0.5beta? Is there a derivation process?

image
Khrylx commented 2 years ago

Hello, sorry for the late reply.

I think if you take the log probability of the GT trajectory Y under the normal distribution, you will get the MSE loss.