Open sallyqiansun opened 11 months ago
As a newcomer to the field of RL (Reinforcement Learning) for TSC (Traffic Signal Control), I've recently embarked on a journey to understand the intricacies of this domain. During my exploration, I've encountered a couple of perplexing issues that I'm hoping to gain clarity on:
Hi! Thank you for releasing your code. After reading the code, it seems like you used sequence length=1 for the experiments, and the reward is used in place of return_to_go in the decision transformer architecture, is that correct? Looking forward to your reply, thank you!