Solve collectables using Stable-Baselines3

RickFqt commented 2 weeks ago

After complete, this PR should:

[x] Implement and check if the current classes are able to solve the Collectables minigame from SC2, using Stable Baselines 3 library.

RickFqt commented 5 days ago

To resolve the conflict with v2, should I merge the v2 branch into this one? It is related to the realocation of the custom_env.py file to another folder.

RickFqt commented 5 days ago

Some issues I had with the implementation:

To run the collectables_check_env.py and solve_collectables_sb3.py files, I had to add the sys.path.append(...) command line at the beginning of each of these files. Is there a better way to do this?
I tried to generalize the solve_collectables_sb3.py behavior into a Trainer class, that would have the train(), test() and methods alike. However, I didn't know how to generalize the learn() method of a given model. Since Stable Baselines 3 offer different kinds of models (A2C, PPO, DQN, ...), many of them have some different parameters when calling their method learn()
About the collectables State implementation, I could only train a "functional" model when using its non_spatial method (the state returned is just a pair of values, the X and Y distance to the closest mineral shard to the player). Still couldn't get it working for the spatial method (the state returned is a representation of the whole map).

alvarofpp commented 5 days ago

To resolve the conflict with v2, should I merge the v2 branch into this one? It is related to the realocation of the custom_env.py file to another folder.

You can use:

git checkout v2
git pull
git checkout solve-collectables-sb3
git rebase v2

Rebasing will be interactive, so it will stop the moment the conflict occurs and you must resolve it manually. After that:

git rebase --continue
git push -f

alvarofpp commented 5 days ago

To run the collectables_check_env.py and solve_collectables_sb3.py files, I had to add the sys.path.append(...) command line at the beginning of each of these files. Is there a better way to do this?

How are you executing these files? Can you share the step by step and the commands?

I tried to generalize the solve_collectables_sb3.py behavior into a Trainer class, that would have the train(), test() and methods alike. However, I didn't know how to generalize the learn() method of a given model. Since Stable Baselines 3 offer different kinds of models (A2C, PPO, DQN, ...), many of them have some different parameters when calling their method learn()

You can use *args and **kargs for parameters. Probably all these SB3 classes inherit from some other class, so you can use this class that they all inherit from as a type of parameter for your function.

About the collectables State implementation, I could only train a "functional" model when using its non_spatial method (the state returned is just a pair of values, the X and Y distance to the closest mineral shard to the player). Still couldn't get it working for the spatial method (the state returned is a representation of the whole map).

Can you describe more about this problem? With the information you've provided, I can't figure out why you're not getting what you want

RickFqt commented 5 days ago

How are you executing these files? Can you share the step by step and the commands?

From the project root folder, I use the command python .\experiments\solves\solve_collectables_sb3.py, the same for collectables_check_env.py. Without the sys.path.append(...) command, an error says that "urnai" is not a known module (since the imports begin with urnai.)

You can use *args and **kargs for parameters. Probably all these SB3 classes inherit from some other class, so you can use this class that they all inherit from as a type of parameter for your function.

Thanks! Indeed, they all inherit from a BaseAlgorithm class, but not directly. Some inherit from OnPolicyAlgorithm, others from OffPolicyAlgorithm, and these inherit from BaseAlgorithm. If I have any problems implementing, I'll come back :)

Can you describe more about this problem? With the information you've provided, I can't figure out why you're not getting what you want

Basically, to train a model, I need to initialize a CustomEnv environment, choose an algorithm from StableBaselines, set its environment to the CustomEnv created, and call its method learn() with the desirable number of steps to train. To initialize a CustomEnv, you need to choose the implementations of Environment, State, ActionSpace and Reward classes that will be used. In this PR, the implementation of the State class accepts a parameter method, which can be:

STATE_MAP: the state returned by the class will be a vector that represents the whole map, with different values representing different elements (for example, 0 for an empty space, 7 if the space is occupied by the player, 100 if the space is occupied by a mineral shard). First, the map is calculated as a 64x64 grid, and then it is flattened to only one dimension (since stablebaselines recommend that the state should be either an image or a 1D vector).
STATE_NON_SPATIAL: the state returned by the class will be a pair of two values, representing the X and Y distances to the closest current mineral shard to the player.

So, I tried to train two models, one with the method STATE_MAP and the other with STATE_NON_SPATIAL, both for 300000 timesteps. When testing the NON_SPATIAL one, it was working pretty well, with the marines moving and collecting a lot of mineral shards on almost every test. However, when testing the MAP one, it felt like it was always going just to one direction, getting stuck on one of the edges of the map.

What I think that is happening is that, since the state of the NON_SPATIAL method is very small and simple, the agent could easily understand how it was related to the reward function. As for the MAP method, since the state returned is very large (a vector with 4096 values), the agent couldn't find a pattern that related it to the reward function before the training stopped.

I think the problem isn't really an error. Maybe tweaking with the size of the returned state or training the model for more timesteps may solve the problem.

UFRN-URNAI / urnai-tools

Solve collectables using Stable-Baselines3 #112