OpenNetLab / gym

This gym leverages NS3 and WebRTC, which can be used by reinforcement learning or other methods to build a Bandwidth Controller for WebRTC
37 stars 14 forks source link

Bug report #21

Open matthewyuhb opened 2 years ago

matthewyuhb commented 2 years ago

I'm using this project to train my reinforcement learning Agent. I found the agent was trapped in a local optimum while training and I found the following phenomenal. I use the trace which has the fixed capacity of 600k and the duration of 180s:

image

I first manually changed the bandwidth of the RL-agent always be 1000k, it made sense(the base rtt is about 200ms):

image image

However, my trained RL-agent trapped into this:

image image

The RTT becomes a minRTT at a very high sending rate! What's more the receiving rate observed by sender side is constantly about 500k and the loss rate is 0%. The pretty high receiving rate and the very low delay made the RL agent think it has learned a nice model so it won't go on optimizing...

image

Is this a bug of the gym?