Sohojoe / MarathonEnvsBaselines

Experimental - using OpenAI baselines with MarathonEnvs (ML-Agents)
Apache License 2.0
19 stars 3 forks source link

Did you find out why you did not get good results with stable-baseline algorithms? #3

Closed icaro56 closed 5 years ago

icaro56 commented 5 years ago

I'm trying to train agents only without visual observations with DQN. Multiagents with one Brain.

Do you advise using baseline or stable algorithms?

What files should I begin to see to start my workouts with DQN? I apologize for the newbie questions.

icaro56 commented 5 years ago

I would like to use QLearning with MLP described in this article: https://core.ac.uk/download/pdf/159516849.pdf

Sohojoe commented 5 years ago

Hi @icaro56,

it does not look like DQN supports multi-environments based on this link nor does stable.baselines - if you really need DQN it might be worth asking for help on baselines and/or stable.baselines and I can help you with the Unity part. If you don't need DQN (i.e. just want to try with a few different algorithms), then I would advise starting with normal ml-agents + PPO as that will probably be as good and as fast as anything else.

Sohojoe commented 5 years ago

oh and I did not get stable.baselines working:

icaro56 commented 5 years ago

Thank you.

What do you mean by continuous observations?

My observations are: position (X,Y) + grid representation matrix converted to ints.

Another doubt. I read in stable-baseline documentation that

Vectorized Environments are a way to multiprocess training. Instead of training a RL agent on 1 environment, it allows to train it on n environments using n processes. Because of that, actions passed to the environment are now a vector (of dimension n). It is the same for observations, rewards and end of episode signals (dones).

In my case, I have 25 stages, 1 brain, and 4 agents per stage = 4 * 25 = 100 agentes. All of that are in the same environment. Please, see this youtube link. I upload a capture of my environment.

https://www.youtube.com/watch?v=zE3hETbSSQI&list=PLcMmj3f6BhXs6ZDdt2fIWNqQCUg0CNtvo&index=14&t=0s

PS: In this video I create 2 to 4 agents by episode. It is random.

Sohojoe commented 5 years ago

so long as all the agents are using the same brain you will be ok. Vectorized Environments mean it expects a list of observations etc. This is how ml-agents works by default and my code handles linking between ml-agents and a Vectorized Environment in gym

100 agents will be find too.

for your observations - all your observations should be bools (discrete value / a float that is 0 or 1) or float that is between -1 and 1 or 0 and 1 (discrete). You can use larger values if the algorithm supports normalization, however, all the normalization is doing is taking a running average a squashing it -1 to 1. The default normalization algorithm in openai.baselines does not save properly. so it is better if you normalize them (i.e. divide by the max value) yourself. Don't use ints to reference states, you should use a bool for each state.

icaro56 commented 5 years ago

Thank you.

My observations are binary flag or hybrid, as I put in this link: https://github.com/Unity-Technologies/ml-agents/issues/802

So, i will have to change these observations. But, I think the hybrid observation will work because are float 0 or 1.

icaro56 commented 5 years ago

Can you share with me some paper that explain why to use bools?

Sohojoe commented 5 years ago

using float 0 or 1 is fine as all goes into a vector. I can't think of a paper specific to RL unity state this as a best practice in their docs - https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Best-Practices.md#vector-observations

the way I think about it is in terms of basic ML supervised learning where one is either predicting discrete categories (think images of cats and dogs - each category is binary / one-hot encoded) or predicting a continuous value (think house sale price)