DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.74k stars 1.66k forks source link

[Question] C++ Inference #836

Closed Gregwar closed 1 year ago

Gregwar commented 2 years ago

Question

Hello, I am using SB3 to train some model where I want the inference to run on embedded robots using C++. I had a look at PyTorch documentation and doing this is not very hard "by hand", but the process could indeed be automated.

I could maybe contribute, but I would like some questions to be answered about how do you imagine the feature.

In my opinion, there is more that can be done that simply documenting it, because we could help to automate the observation and action rescaling and normalizing, in a way such that methods like model.predict gets converted seamlessly to C++.

Here is how I would see it:

What do you think ?

Additional context

-

Checklist

araffin commented 2 years ago

Hello, that would be a valuable extension to SB3 but should be done in the RL Zoo I think (or in an external repo).

Here is how I would see it:

Is that different options or a list of features?

but yes, at the end would be nice to have something like python sb3_export.py --algo a2c --env MyEnv -i path_to_model or in the case of the RL Zoo: python sb3_export.py --algo a2c --env MyEnv -f exp_folder --exp-id 1 (it fins the experiment folder automatically and also loads the normalization and other wrappers if needed)

I would like some questions to be answered about how do you imagine the feature.

what are your questions?

Gregwar commented 2 years ago

My list is a list of features.

About questions, I mostly wanted to know what was the recommended direction for this. Making it some contrib to RL Zoo indeed looks like the way to go, since the export can be achieve "from the outside" or SB3.

araffin commented 2 years ago

Making it some contrib to RL Zoo indeed looks like the way to go, since the export can be achieve "from the outside" or SB3.

Feel free to open a draft PR there if you want to discuss it in more details ;)

Gregwar commented 2 years ago

I started working on that in the RL Zoo, I will indeed open a draft PR soon, even if it will not support everything it can be used as a base for discussions

I have a very short term goals of embedding inferences in our humanoid robots so I will also be the first user

Gregwar commented 2 years ago

Ok, I started a draft PR

https://github.com/DLR-RM/rl-baselines3-zoo/pull/228/

Design choices

A procedure to test

To test that it indeed works, I added an option to generate a Python binding while building, so that we can directly use Python's gym to test it. Here are the steps:

  1. Be sure you have Pytorch installed and in your CMAKE_PREFIX_PATH
  2. Install pybind11, with for instance apt-get install python3-dev python3-pybind11
  3. Train, for instance DQN with CartPole-v1 and then run something like: python enjoy.py --env CartPole-v1 --algo dqn -f logs/ --export-cpp predictors/
  4. Go in predictor and build:
    • cd predictors
    • mkdir build/
    • cmake -DBASELINES3_PYBIND=ON ..
    • make -j8
  5. This should produce a libbaselines3_models.so, that is where your predictors are
  6. This should also produce something like baselines3_py.cpython-36m-x86_64-linux-gnu.so, this can allow you to test that it works using python env

From here you can test with such a script:

import gym
from baselines3_py import CartPole_v1

cp = CartPole_v1()
env = gym.make('CartPole-v1')

obs = env.reset()
while True:
    action = cp.predict(obs)
    obs, reward, done, info = env.step(int(action[0]))
    env.render("human")

    if done:
        obs = env.reset()

obs = gym.reset()

It should show you the CartPole, using the C++ built library for prediction. You can also of course build without the Python binding and use the library from your C++ code.

An example can be found in predict.cpp that is build as binary if you set BASELINES3_BIN to ON (hard coded for CartPole-v1).

stalek71 commented 2 years ago

Maybe you are just looking for something like this... https://onnxruntime.ai/docs/

Gregwar commented 2 years ago

@stalek71 thanks for the lead, however the problem here is not exactly to save a PyTorch model to a C++ executable file and load it (which is explained in [1]), but to export SB3 models (at least in first place model.predict()) to C++

More specifically, it implies:

This is not very complicated matter but it requires some specific knowledge of how underlying RL agents works. So I guess it's good to have the whole process automated.

(The use case is: I want the thing running in a robot without running any Python code (because it is embedded real-time robotics application).)

[1] https://pytorch.org/tutorials/advanced/cpp_export.html

araffin commented 2 years ago

Thanks for the PR =)

The export is initiated through enjoy.py, since I didn't wanted to duplicate or factor out the environment loading logic, to start the export pass --export-cpp target_directory to enjoy.py (supplementing all the usual flags to load your model)

sounds good.

So far only action inference is provided

I think we should keep the first version as simple as possible (for instance limiting ourselves to a subset of models or action spaces)

, they are embedded in the binary as ressources using CMRC

how much more difficult is it to just give a path to torch.jit.load()?

Go in predictor and build:

this could be even automated, no?

from baselines3_py import CartPole_v1

I would rather keep the name of the algorithm (or concatenate it with the name of the env) to avoid confusion.

This should produce a libbaselines3_models.so, that is where your predictors are

Do you have also an example cpp file to show how to use that shared lib? for the name, we will discuss it too (whether it should be baselines3 or sb3 or stablebaselines3, I would lean towards the last two ;))

action = cp.predict(obs)

This is not consistent with SB3 API, but I think it's fine as it is target towards deployment (and only RecurrentPPO requires a hidden state).

Gregwar commented 2 years ago

how much more difficult is it to just give a path to torch.jit.load()?

It is not, I really can make it this way, dropping the dependency with CMRC or making it optional

this could be even automated, no?

Enabling the Python binding is more like a test than a real use case. There is likely not that much performance boost because most of the computation is actually achieved by PyTorch, it's just about embedding it in C++.

But yes, we could automate the build and run of Python tests on the top of library for unit test purposes, CI and so

Do you have also an example cpp file to show how to use that shared lib?

So far just the simple: https://github.com/Gregwar/rl-baselines3-zoo/blob/export_cpp/cpp/src/predict.cpp

araffin commented 2 years ago

, dropping the dependency with CMRC or making it optional

Less dependencies is usually better ;)

Enabling the Python binding is more like a test than a real use case. T

I meant automating the build of the shared lib, but I probably misunderstood what you wrote.

we could automate the build and run of Python tests on the top of library for unit test purposes

this would be nice to have at least some test on the CI (nothing too complicated)

So far just the simple:

thanks =)

araffin commented 1 year ago

Closing that one in favor of another one that gonna be opened soon (per discussion with Grégoire).