Open RickFqt opened 2 weeks ago
To resolve the conflict with v2, should I merge the v2 branch into this one? It is related to the realocation of the custom_env.py
file to another folder.
Some issues I had with the implementation:
collectables_check_env.py
and solve_collectables_sb3.py
files, I had to add the sys.path.append(...)
command line at the beginning of each of these files. Is there a better way to do this?solve_collectables_sb3.py
behavior into a Trainer
class, that would have the train()
, test()
and methods alike. However, I didn't know how to generalize the learn()
method of a given model. Since Stable Baselines 3 offer different kinds of models (A2C, PPO, DQN, ...), many of them have some different parameters when calling their method learn()
State
implementation, I could only train a "functional" model when using its non_spatial method (the state returned is just a pair of values, the X and Y distance to the closest mineral shard to the player). Still couldn't get it working for the spatial method (the state returned is a representation of the whole map).To resolve the conflict with v2, should I merge the v2 branch into this one? It is related to the realocation of the
custom_env.py
file to another folder.
You can use:
git checkout v2
git pull
git checkout solve-collectables-sb3
git rebase v2
Rebasing will be interactive, so it will stop the moment the conflict occurs and you must resolve it manually. After that:
git rebase --continue
git push -f
- To run the
collectables_check_env.py
andsolve_collectables_sb3.py
files, I had to add thesys.path.append(...)
command line at the beginning of each of these files. Is there a better way to do this?
How are you executing these files? Can you share the step by step and the commands?
- I tried to generalize the
solve_collectables_sb3.py
behavior into aTrainer
class, that would have thetrain()
,test()
and methods alike. However, I didn't know how to generalize thelearn()
method of a given model. Since Stable Baselines 3 offer different kinds of models (A2C, PPO, DQN, ...), many of them have some different parameters when calling their methodlearn()
You can use *args
and **kargs
for parameters. Probably all these SB3 classes inherit from some other class, so you can use this class that they all inherit from as a type of parameter for your function.
- About the collectables
State
implementation, I could only train a "functional" model when using its non_spatial method (the state returned is just a pair of values, the X and Y distance to the closest mineral shard to the player). Still couldn't get it working for the spatial method (the state returned is a representation of the whole map).
Can you describe more about this problem? With the information you've provided, I can't figure out why you're not getting what you want
How are you executing these files? Can you share the step by step and the commands?
From the project root folder, I use the command python .\experiments\solves\solve_collectables_sb3.py
, the same for collectables_check_env.py
. Without the sys.path.append(...)
command, an error says that "urnai" is not a known module (since the imports begin with urnai.
)
You can use
*args
and**kargs
for parameters. Probably all these SB3 classes inherit from some other class, so you can use this class that they all inherit from as a type of parameter for your function.
Thanks! Indeed, they all inherit from a BaseAlgorithm
class, but not directly. Some inherit from OnPolicyAlgorithm
, others from OffPolicyAlgorithm
, and these inherit from BaseAlgorithm
. If I have any problems implementing, I'll come back :)
Can you describe more about this problem? With the information you've provided, I can't figure out why you're not getting what you want
Basically, to train a model, I need to initialize a CustomEnv
environment, choose an algorithm from StableBaselines, set its environment to the CustomEnv
created, and call its method learn()
with the desirable number of steps to train.
To initialize a CustomEnv
, you need to choose the implementations of Environment
, State
, ActionSpace
and Reward
classes that will be used. In this PR, the implementation of the State
class accepts a parameter method
, which can be:
So, I tried to train two models, one with the method STATE_MAP and the other with STATE_NON_SPATIAL, both for 300000 timesteps. When testing the NON_SPATIAL one, it was working pretty well, with the marines moving and collecting a lot of mineral shards on almost every test. However, when testing the MAP one, it felt like it was always going just to one direction, getting stuck on one of the edges of the map.
What I think that is happening is that, since the state of the NON_SPATIAL method is very small and simple, the agent could easily understand how it was related to the reward function. As for the MAP method, since the state returned is very large (a vector with 4096 values), the agent couldn't find a pattern that related it to the reward function before the training stopped.
I think the problem isn't really an error. Maybe tweaking with the size of the returned state or training the model for more timesteps may solve the problem.
After complete, this PR should: