StanfordASL / Trajectron-plus-plus

Code accompanying the ECCV 2020 paper "Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data" by Tim Salzmann*, Boris Ivanovic*, Punarjay Chakravarty, and Marco Pavone (* denotes equal contribution).
MIT License
659 stars 186 forks source link

Dependency of input states #26

Closed zhangpur2 closed 4 years ago

zhangpur2 commented 4 years ago

Hi! Thanks for your great work! We have tried to train your model without using the acceleration and the angular speed sequences as input (based on the int_ee_me model with other settings fixed), since these data are hard to be accurately estimated in our practical case, but the error is super high(ml/ade_mean:0.95,ml/fde_mean:2.13), it seems weird. We do not know how to correctly train a model without the input of acceleration and the angular speed. Could you please give any instructions about the degree of necessary/important of different input states? Moreover, we feel like trajectron++ may be more fitted to well estimated data rather than sensory data with noise. Please correct us if there is a mistake in our thinking.

BorisIvanovic commented 4 years ago

Hi @zhangpur2 , thanks for your questions!

I think the answer is as follows:

  1. Firstly, any deep learning-based model is more or less going to be better applied to well-estimated data rather than sensory data with noise (although it mostly depends on the actual level of noise that you're dealing with).
  2. Something that we've done before in this case is a bit of data preprocessing in order to effectively remove the majority of sensor-based noise. For example, have you tried Kalman Filtering your sensor data in order to get a cleaner estimate of its true value?
  3. The states in our work are mostly generated from positional data (you can even see this in our data preprocessing scripts). What that means is the datasets we work with mostly only provide x and y positions. We then (depending on the dataset) preprocess the positions before differentiating them to produce velocities and then again to produce accelerations. For instance, in the nuScenes dataset we first apply a Kalman Filter to the observed vehicle states before differentiating them.
  4. As for the importance of different input states, it's hard to quantify that since it will essentially change per dataset and per model version. Different datasets/model versions may find more utility out of certain states. Generally though, things like position/velocity/heading are important as these values are directly used in our output dynamics (at least, the current values initialize our predictions, then the predicted values coming from our model will be functions of the rest of the input state components).
zhangpur2 commented 4 years ago

Thank you very much for your quick help and detailed instructions. We have read the preprocessing code in process_data.py and understood that velocity and acceleration values are generated through the positional data.

The only concern is that the function used for differentiating is and calls . np.gradients approximates in different ways for boundary points and non-boundary points. For an example of calculating velocities, np.gradients will give v_x[t] = (x[t+1]-x[t-1])/(2*dt) at time t inside the boundary, and give v_x[t] = (x[t]-x[t-1])/(dt) at the point where trajectory ends. In your code, it seems that the whole trajectory is first differentiated and then divided into fragments, which leads the velocity inputs at the last observed time step (T=t_o) to use the positional data at _T=to+1. Similarly, acceleration inputs could use the positional data till T=t_o+2. From the angle of data reliance for trajectory prediction, it is better to not use the data at T>t_o to prepare the input sequences. In contrast to this way, it may be more appropriate to use an online differentiation for preparing velocity and acceleration inputs, or to only use the filtered positional data without the offline differential data.

We have to say we are not questioning your work, the above thoughts are just some of the details of the data processing and do not conflict with the model novelty. We highly appreciate your sharing of code and view exchanges.

BorisIvanovic commented 4 years ago

That's a good observation, I must have read the documentation wrong the first time, thank you! Feel very free to change that to np.ediff1d or similar.

I'm not too worried about questioning/model novelty/anything like that. The model has already been independently verified during the nuScenes prediction challenge as well as with real streaming human data in this work and it still performed well. Plus, other researchers outside of our lab have used the model on their own data/datasets other than these (using their own preprocessing) and it's been good too :)

zhangpur2 commented 4 years ago

Thanks for your frank answer! May I ask how did you prepare input states for evaluating the nuScenes prediction challenge? This would be a very helpful guidance for me to prepare the model inputs in an online mode. Actually I am not so experienced in such online processing. Relevant script could be great. If that is not available, things like principle steps or advice are also helpful.

BorisIvanovic commented 4 years ago

Sure, take a look at the nuscenes-devkit prediction helper! It has a lot of nice functions for getting nuScenes data as numpy arrays, including getting past agent histories and futures.

As for the actual metric evaluation: I don't know what code they used in their evaluation server, but they talk about the specific evaluation metrics on the challenge page.

zhangpur2 commented 4 years ago

Thanks!:)