In the paper, we could see the loss during the supervising learning should be the KL divergence of between the trajectories and the neural network output:
However, in the code, why it becomes the || label - output ||2?
Is it because of the convenience of implementation?
In the paper, we could see the loss during the supervising learning should be the KL divergence of between the trajectories and the neural network output:
However, in the code, why it becomes the || label - output ||2?
Is it because of the convenience of implementation?