UFRN-URNAI / urnai-tools

A modular Deep Reinforcement Learning library that supports multiple environments, made with Python 3.6.
Apache License 2.0
5 stars 8 forks source link

Solve collectables using Stable-Baselines3 #112

Open RickFqt opened 2 weeks ago

RickFqt commented 2 weeks ago

After complete, this PR should:

RickFqt commented 5 days ago

To resolve the conflict with v2, should I merge the v2 branch into this one? It is related to the realocation of the custom_env.py file to another folder.

RickFqt commented 5 days ago

Some issues I had with the implementation:

alvarofpp commented 5 days ago

To resolve the conflict with v2, should I merge the v2 branch into this one? It is related to the realocation of the custom_env.py file to another folder.

You can use:

git checkout v2
git pull
git checkout solve-collectables-sb3
git rebase v2

Rebasing will be interactive, so it will stop the moment the conflict occurs and you must resolve it manually. After that:

git rebase --continue
git push -f
alvarofpp commented 5 days ago
  • To run the collectables_check_env.py and solve_collectables_sb3.py files, I had to add the sys.path.append(...) command line at the beginning of each of these files. Is there a better way to do this?

How are you executing these files? Can you share the step by step and the commands?

  • I tried to generalize the solve_collectables_sb3.py behavior into a Trainer class, that would have the train(), test() and methods alike. However, I didn't know how to generalize the learn() method of a given model. Since Stable Baselines 3 offer different kinds of models (A2C, PPO, DQN, ...), many of them have some different parameters when calling their method learn()

You can use *args and **kargs for parameters. Probably all these SB3 classes inherit from some other class, so you can use this class that they all inherit from as a type of parameter for your function.

  • About the collectables State implementation, I could only train a "functional" model when using its non_spatial method (the state returned is just a pair of values, the X and Y distance to the closest mineral shard to the player). Still couldn't get it working for the spatial method (the state returned is a representation of the whole map).

Can you describe more about this problem? With the information you've provided, I can't figure out why you're not getting what you want

RickFqt commented 5 days ago

How are you executing these files? Can you share the step by step and the commands?

From the project root folder, I use the command python .\experiments\solves\solve_collectables_sb3.py, the same for collectables_check_env.py. Without the sys.path.append(...) command, an error says that "urnai" is not a known module (since the imports begin with urnai.)

You can use *args and **kargs for parameters. Probably all these SB3 classes inherit from some other class, so you can use this class that they all inherit from as a type of parameter for your function.

Thanks! Indeed, they all inherit from a BaseAlgorithm class, but not directly. Some inherit from OnPolicyAlgorithm, others from OffPolicyAlgorithm, and these inherit from BaseAlgorithm. If I have any problems implementing, I'll come back :)

Can you describe more about this problem? With the information you've provided, I can't figure out why you're not getting what you want

Basically, to train a model, I need to initialize a CustomEnv environment, choose an algorithm from StableBaselines, set its environment to the CustomEnv created, and call its method learn() with the desirable number of steps to train. To initialize a CustomEnv, you need to choose the implementations of Environment, State, ActionSpace and Reward classes that will be used. In this PR, the implementation of the State class accepts a parameter method, which can be:

So, I tried to train two models, one with the method STATE_MAP and the other with STATE_NON_SPATIAL, both for 300000 timesteps. When testing the NON_SPATIAL one, it was working pretty well, with the marines moving and collecting a lot of mineral shards on almost every test. However, when testing the MAP one, it felt like it was always going just to one direction, getting stuck on one of the edges of the map.

What I think that is happening is that, since the state of the NON_SPATIAL method is very small and simple, the agent could easily understand how it was related to the reward function. As for the MAP method, since the state returned is very large (a vector with 4096 values), the agent couldn't find a pattern that related it to the reward function before the training stopped.

I think the problem isn't really an error. Maybe tweaking with the size of the returned state or training the model for more timesteps may solve the problem.