The quality of the generated trajectories must be evaluated, and that's done for each of the best-performing model parameters of every one of the policies in the analysis, concerning the performed success rate.
[x] Trajectory similarity (to the ones in the original dataset, given the observation). Mean Squared Error over the joint values during the trajectory for each joint during time.
[x] Waypoint variance is the sum (along the trajectory dimension) of the pairwise L2-distance variance between waypoints at corresponding timesteps \cite{carvalho_motion_2023}, quantifies the multimodality of the generated trajectories.
[x] Time it takes to generate single action with policy.
I wrote