Could not reproduce the reward

Jacklinkk / TorchGRL

TorchGRL is the source code for our paper Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments.

83 stars 10 forks source link

Could not reproduce the reward #1

Open wangshuo1994 opened 2 years ago

wangshuo1994 commented 2 years ago

hello authors, thank you so much for implementing a PyTorch version of GCQ, the code is clear and the pytorch is much up to date than the original keras-rl. Could you please help me figure out why in my training and testing periods, the reward is fluctuating in a low level (it did not improve after the train begin, no jump in reward could be observed). Thank you in advance!

Jacklinkk commented 2 years ago

Hello, it's my pleasure to hear from you! You can search into “GRL_simulation/specific_environment.py", and uncomment line 260-263, specifically:

# if len(rl_ids) != 0: 
        # self.k.vehicle.apply_lane_change(rl_ids, rl_actions2)
# else:
        # pass

Then the RL controller will be activated and you can observe the reward. If lines 260-263 are commented, the reward you observed is generated by rule-based methods which can be chosen as the baseline. Hope it can help you solve the problem:grinning:！

wangshuo1994 commented 2 years ago

@Jacklinkk Thank you so much for your kind help! I will have a look~

JTtht commented 4 days ago

@wangshuo1994 Hello, do I need to configure SUMO_HOME environment variables for Pycharm and Ubuntu at the same time? I run T_01 after configuring everything else with the error "please declare environment variable 'SUMO_HOME'", which is really important to me.