Pure python training, evaluation and rollout documentation request.

redzhepdx commented 7 months ago

Hi everyone,

As a professional who has worked with a few RL frameworks in the past, I can confidently say that this is one of the cleanest, most user-friendly, and advanced RL library I've encountered. In fact, I'm planning to introduce it to my team as our future RL framework, and we're excited to contribute to its development. I especially appreciate the Dreamer implementations and the informative blog posts – amazing work!

Based on my experience with RL framework development, I have a few recommendations that could make this library even more appealing to a wider range of engineers:

Pure Python Examples:

While I understand the value of Hydra as a tool for configuration management and rapid experimentation, it can be intimidating for newcomers. To address this barrier and encourage broader adoption, I recommend creating 3-4 pure Python documentation/tutorial examples demonstrating training, evaluation, and rollout using existing Lagos functionalities. This approach has been successful in attracting large-scale users to other RL libraries.

Here are some examples that might be helpful:

Tips and Tricks:

As we all know, RL algorithms are sensitive to hyperparameters and often require specific techniques like action masking, observation normalization, and reward scaling to be successful on new environments. Given the library's advanced capabilities with World Models, sharing insights and best practices on these topics would be incredibly valuable to the community (including myself!). Here are some examples from other libraries:

https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html https://maze-rl.readthedocs.io/en/latest/best_practices_and_tutorials/tricks_of_the_trade.html

Transitioning to Hydra:

Once users become comfortable with the library's fundamentals, they'll naturally progress towards exploring scalability and advanced experimentation, which is where Hydra shines. Consider creating a separate tutorial or example notebook showcasing how to leverage Hydra and Sheep-RL's train and evaluate functionalities to achieve this transition smoothly.

I hope you find these recommendations helpful. Best of luck to the developers!

belerico commented 7 months ago

Hi @redzhepdx! Thank you for the suggestions, really appreciated them! We can definitely have something similar to this and this: what do you think @michele-milesi? For the contribution we have to introduce a how to contribute.md, but if you want there is an old issue regarding the implementation of the DQN methods and their variants, if you want to start somewhere. Thank you

michele-milesi commented 7 months ago

Hi there, @belerico, yes, we can start with something similar to the two examples you mentioned. For the environment part, I think we can try to recycle this. Or are you thinking to use a more complex environment? (like this).

redzhepdx commented 7 months ago

Hi @michele-milesi , I believe the complexity of the environment matters little. You can use any environment but I would recommend something like crawler or any of mujoco or classical gym environments to show the capabilities of the framework on decently challenging cases so anyone can test it locally.

Thanks a lot for your prompt reaction to this topic.

verityw commented 6 months ago

Is there any update on this? Would really appreciate a pure Python example to use for research, to better integrate my existing stable-baselines code with!

michele-milesi commented 6 months ago

Hi @verityw, we are fixing a few problems we found with half-precision training. After this, we will move on to pure python examples. Thank you for your patience.

Eclectic-Sheep / sheeprl

Pure python training, evaluation and rollout documentation request. #209