Smart-Trafficlab / TransformerLight

TransformerLight: A Novel Sequence Modeling Based Traffic Signaling Mechanism via Gated Transformer (29th ACM SIGKDD)
21 stars 2 forks source link

sequence length and return to go #3

Open sallyqiansun opened 11 months ago

sallyqiansun commented 11 months ago

Hi! Thank you for releasing your code. After reading the code, it seems like you used sequence length=1 for the experiments, and the reward is used in place of return_to_go in the decision transformer architecture, is that correct? Looking forward to your reply, thank you!

AlexBrians commented 10 months ago

As a newcomer to the field of RL (Reinforcement Learning) for TSC (Traffic Signal Control), I've recently embarked on a journey to understand the intricacies of this domain. During my exploration, I've encountered a couple of perplexing issues that I'm hoping to gain clarity on:

  1. In my diligent review of the code, particularly for models like DT and other similar models, I've noticed an intriguing aspect. It appears that these models do not utilize sequences for decision-making processes. This observation leads me to believe that, despite their labels, these models function more like conventional models rather than exploiting the sequential decision-making characteristic inherent in typical RL frameworks.
  2. The papers I've studied often discuss the implementation of advanced models like BC (Behavioral Cloning), CQL (Conservative Q-Learning), etc. However, I find the details regarding the implementation of these models quite vague. What puzzles me further is the authors' focus on the models proposed in the articles, while seemingly omitting the models used as offline baselines. This omission is curious and somewhat confusing.