Closed tsampazk closed 1 year ago
~Updating the description and also uploading a trained agent within the day.~ done
I did some git juggling to remove the merge commit from dev i did earlier and did a rebase instead. I think all went well as far as i can tell. There is some dirty history in the (310) commits that's from older versions i worked on, but i guess what's done is done now, it will get squashed eventually.
Some more git/github shenanigans happened, nothing much changed since i requested reviews, just renaming of a variable and an update of the main env docstring that had a TODO before
Thank you @KelvinYang0320! No worries, i will apply the fixes to the readme and feel free to review the rest when you can :smile:
I'll go ahead and merge this as i need to include the regular link to the code for the relevant paper and we can revisit it later if needed. In general i feel that the example is in a great spot and only some details might need more work. As soon as the paper is published i will add it in the relevant README.
This PR adds a new find and avoid example. This one is much much more complicated. I will provide some general information to get you started with the review.
The algorithm used is Maskable PPO from sb3-contrib. The logic that creates the mask for each step is contained within the method get_action_mask of the environment. Proper masking is crucial for the agent to train and learn useful behaviors.
The robot used is a custom one with 13 forward-side facing distance sensor which can be set as sonars or generic through the code, as well as add noise, etc., 2 touch sensors in front one in the left and one in the right and of course 2 motors for differential drive.
The observation is the distance and relative angle to the target, the current motor speeds (-1.0 to 1.0), the touch sensor values, and the distance sensor values. All values are normalized appropriately. In addition the user can set a
steps window
as well as aseconds window
to augment the observation with observations from the past, see a good description within the docstring here.The discrete action space consists of 5 actions to control the robot. The mapping can be found in the docstring here.
The environment incorporates random map generation in terms of putting a collection of 25 obstacles within an arena randomly. There is the possibility to modify the difficulty and create curriculums of easy to hard maps, as is done in the default provided trainer. You can find a description on the readme here.
All the parameters, difficulty, etc. seen are exactly the setup i used to train the provided agent. There is huge room for experimentation in various aspects of the problem. I tried my best to test out many things and i think that the setup i reached produces a pretty good trained agent.