fvalka / atc-reinforcement-learning

Reinforcement learning for an air traffic control task. OpenAI gym based simulation.
64 stars 18 forks source link

How to eliminate "invalid actions" #19

Open epaulz-vt opened 3 years ago

epaulz-vt commented 3 years ago

Hello,

Not sure if this repo is active, but I am interested in using your environment for a research project. I have built my own simple DeepQ network to train on the ATC environment. I got it working except I get the messages "Warning invalid action: 400 for index: 0" and "Warning invalid action: 57000 for index: 1" often and I can't figure out how to resolve this.

It seems as though my agent is staying near the initial starting point, and will only move up/down/left, but will not move to the right. It does not seem to be learning past a certain point, and I wonder if this is caused by using these "invalid actions"?

Any assistance would be much appreciated.

Eric

fvalka commented 3 years ago

Hello Eric,

not very active. But I still am.

Sounds like you're trying to perform actions which are outside of the action space.

If you are using the continous action space with normalization (the default) everything should be normalized to between -1 and 1

see the action space definition here: https://github.com/fvalka/atc-reinforcement-learning/blob/c603a40f9485a28d88a802de22afdb4cf638add5/envs/atc/atc_gym.py#L81-L82

Hope that helped.

All the best Fabian

epaulz-vt commented 3 years ago

Thank you for your response. I have managed to move past the invalid action issue. However, I am having a hard time understanding how to properly interact with the action space of this environment from my custom DeepQ network... let me explain.

When training on an environment like CartPole or LunarLander, the "action space" is a set of scalar values (say 0-4), one of which is selected and then gets interpreted and perhaps translated by the environment in some way. When I use that approach here, it seems that each "action" is a tuple of 3 separate actions (v,h,phi). When I try to choose a scalar action, I get an error because the environment expects to be able to index my action. However, my attempts to modify my model to select and store actions in tuples does not seem to be working.

Do you perhaps have any examples of training a model other than those from 'baselines' so that I could get a better idea of how to interact with this environment? I am very interested in getting this working.

epaulz-vt commented 3 years ago

I suppose a simpler way to explain my dilemma is that I don't quite understand how to interact with the continuous action space (I am still fairly new to machine learning). I see that there seems to be a way to switch the environment to a discrete action space. However, no matter which mode it's in when I attempt to understand the action space with "num_outputs = env.action_space.n" it keeps telling me that 'Box' and 'MultiDiscrete' don't have an 'n' attribute.