How to implement a new RL algorithm?

xuanyaoming commented 1 year ago

Hi, thank you for your fantastic implementation of the RL algorithms. In my situation, I need to create my own neural net structure and code it from scratch if possible. Because I'm required to test my code in ISAAC GYM, which assumes that all RL algorithms are provided by this repository. I wonder is it possible to make my own code consistent with your conventions? That probably means to wrap my algorithm inside your Runner object. But sadly I have no idea where to start. Do you have any guidence or suggestions?

Denys88 commented 1 year ago

Hi @xuanyaoming Please take a look at the example how amp was done: https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/tree/3d2d5ed7ce71401db3063d8c339f4a800ce8709f/isaacgymenvs/learning

xuanyaoming commented 1 year ago

Thanks for the response. I just read the codes in your link and have some follow-up questions regarding to your repo.

What is amp? I'm new to GPU acceleration and haven't encountered this concept before. It doesn't look like a RL task to me. And surprisingly, ISAAC GYM is using amp as some sort of default operation. I tried to search the word with google but got no meaningful result. Could you kindly share me a link that explains amp?
What's the difference between algo_factory and player_factory? From my perspective, one factory is enough. Why do we need both?
It's not clear to me how to create my own RL algorithm class. For example, if I plan to design a RL algorithm based on MLP (multi-layer perceptron) which follows the encoder-decoder structure, but I don't know the exact depth or the size of the bottleneck in the network, how do I implement this idea with your repo? Is there a base_algo class that I can use? Is there a blueprint to update your code when a new algorithm is released?

Denys88 commented 1 year ago

1) AMP is just a 'Adversarial Motion Priors' paper implementation with custom algorithm. 2) In the algo factory you have algo which trains. for the player factory it is very simplified code for inference only. Easier to use. 3) Please take a look at the yaml configuration. Neural network is the different entity from algo. It might look like overcomplicated but I have network class(responsible for the architecture) -> model class (responsible what to do with network outputs) and finaly algo.

xuanyaoming commented 1 year ago

Many thanks! Understanding your design principle really helps me a lot! The codes in this repo makes more sense to me now. By the way, I think I just spotted a bug in network_builder class. I'll discuss it as a new issue.

Denys88 / rl_games

How to implement a new RL algorithm? #239