MCZhi / Driving-IRL-NGSIM

[T-ITS] Driving Behavior Modeling using Naturalistic Human Driving Data with Inverse Reinforcement Learning
MIT License
192 stars 37 forks source link

I have some questions that I would like your answers #7

Closed 123yu456 closed 2 months ago

123yu456 commented 2 months ago
  1. When sampling the trajectory, why only use a quartic polynomial in the longitudinal direction, and do not need to determine the longitudinal position of the vehicle at the last moment?
  2. During training and testing, is the result of trajectory sampling to predict or plan the trajectory within the next 5 seconds? I feel like it's planning, but you keep talking about it in your paper as a predictive trajectory. 3.. In the paper, I found that you used the reward function to explain the relationship between the demonstration trajectory and the output trajectory. Is there any intuitive way to directly compare the difference between the original trajectory of the vehicle and the final output trajectory of the sampler? Because when I run the code, I don't see the output of the generated trajectory. 4.How can I get the results in figure 6 and figure7? Your paper focuses on the modeling of the driver, and I pay more attention to the generated trajectory and the comparison with the original trajectory, so I raise the above question. Looking forward to your reply, thank you very much! image image
123yu456 commented 2 months ago

another question: you mentioned in your paper,"For a vehicle in the dataset, its original trajectory throughout the highway section, which is approximately 50 to 70 seconds in time length, is evenly partitioned into 50 short-term trajectories, each with 5 s length of time. Each trajectory represents a driving scene involving different situations and different kinds of interactions with the surrounding vehicles. 35 trajectories among them are randomly selected and serve as the training data for reward function learning. The rest 15 trajectories serve as the testing conditions, where the learned reward function is used to select the candidate trajectories. " Does this mean that the first 35 tracks used for training were not selected in chronological order of vehicle movement? The vehicle trajectory used for training may actually occur after the trajectory data used for testing?

MCZhi commented 2 months ago

Here are my answers to your questions:

  1. A quartic polynomial allows for the representation of acceleration and deceleration patterns without excessively complicating the model. It is sufficient for most highway driving scenarios where the key objectives are to maintain a safe distance from other vehicles, adhere to speed limits, and execute smooth accelerations and decelerations.
  2. Prediction refers to forecasting the future positions of surrounding vehicles, while planning involves generating a trajectory for the ego vehicle itself to follow. In the paper, the essence of this sampling process is indeed planning, as it involves deciding on a trajectory for the ego vehicle. However, when applying this process to other vehicles, it is considered as prediction.
  3. I use the human-likeness metric in the paper to directly compare the difference between the original trajectory and the final output trajectory of the sampler. You can also try visualizing both trajectories on the same plot, and this would allow for a straightforward visual comparison of trajectories.
  4. To get the results in Figure 6, you need to store the sampled trajectories and the original trajectory during testing and load them in MATLAB or other plotting tools to visualize the results.
  5. The tracks are randomly selected for training and testing, and there is no chronological order of vehicle movement. By randomly selecting 35 trajectories out of the 50 short-term trajectories for training, and using the remaining 15 for testing, the approach ensures a diverse representation of scenarios.
123yu456 commented 2 months ago

Thank you very much for your reply, which is very helpful to me!