StanfordASL / Trajectron-plus-plus

Code accompanying the ECCV 2020 paper "Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data" by Tim Salzmann*, Boris Ivanovic*, Punarjay Chakravarty, and Marco Pavone (* denotes equal contribution).
MIT License
646 stars 183 forks source link

Difference between "position" and "velocity" prediction? #70

Closed Leon0402 closed 2 years ago

Leon0402 commented 2 years ago

I saw in the model configurations that you use sometimes velocity" prediction (for some eth configurations):

"pred_state": {"PEDESTRIAN": {"position": ["x", "y"]}}

and sometimes "position" prediction (nuScenes, and some eth configs):

"pred_state": {"VEHICLE": {"position": ["x", "y"]}, "PEDESTRIAN": {"position": ["x", "y"]}}

As far as I understood the model always will produce positions at the very end, because that's what we care about. Does this just mean in the case of "position" prediction the generative model / LSTM will directly produce positions. And in case of velocity it will produce velocities and use an Integrator to get positions?

I think "velocity prediction" is what you explicitly mentioned in your paper then. Because there you said the model would predict velocities and use integration for physical plausibility.

Do I understand the difference correctly? Why would you use one over the other?

I tried both methods with my dataset and "position" prediction gives me terrible results compared to "velocity prediction".

BorisIvanovic commented 2 years ago

Does this just mean in the case of "position" prediction the generative model / LSTM will directly produce positions. And in case of velocity it will produce velocities and use an Integrator to get positions?

Yes, exactly!

I think "velocity prediction" is what you explicitly mentioned in your paper then. Because there you said the model would predict velocities and use integration for physical plausibility.

Yep.

Why would you use one over the other?

Well, you probably wouldn't want to use the position one directly (as you pointed out), but for different agent dynamics models the differences matter, since it might be better to predict steering rate and acceleration (e.g., for cars) or just velocities (e.g., for pedestrians) depending on the dynamics model you wish to use with them.

Leon0402 commented 2 years ago

@BorisIvanovic What you say makes a lot of sense, but I think what confuses me is the model configs in the experiments folder.

In https://github.com/StanfordASL/Trajectron-plus-plus/tree/master/experiments/nuScenes/models three of the nuScenes Configs (int_ee, int_ee_me, robot) directly predict positions.

Only the vel_ee actually predicts velocities. None of the configs predicts acceleration or steering rate.

So are these configs just out of date? What is your current best config?

BorisIvanovic commented 2 years ago

Ahh ok, I understand your question better now. Let me amend my above answer as follows:

Does this just mean in the case of "position" prediction the generative model / LSTM will directly produce positions. And in case of velocity it will produce velocities and use an Integrator to get positions?

Almost, "in case of velocity it will produce velocities and use an Integrator to get positions" is correct. However, in the case of "position" then the generative model / LSTM will produce the controls of the associated dynamics model and the integration through those dynamics models will produce positions.

To expand on this: pred_state is simply the "final" quantity that is to be predicted by the model and what is compared during evaluation. This is why all of those configs have "position" as the pred_state (except for the velocity one, which is actually the baseline i.e., dynamics-agnostic predictions just assuming a single integrator for all agents).

You can see this explicitly in the config files by which dynamics class is chosen per agent type. Take the vel_ee config, all of the agents have SingleIntegrator as their dynamics name. However, in the other configs (which are agent dynamics-aware), you can see Unicycle for vehicle-like agents.

If you're getting terrible results using the position pred_state, then it would be good to trace the model output after the final decoder GRU cell (like, see how the output is integrated through the dynamics equations to see if wonky outputs are coming out, since the dynamics should be providing a very reasonable inductive bias to the model).