[Question] exported ONNX model does not result in same output as the original pytorch model

VineetTambe commented 11 months ago

❓ Question

I am trying to export the trained pytorch model to onnx so that I can deploy it. But I am facing some issues where the output of the exported model is not the same as the pytorch model when I run a episode. I have made sure that I set the model to eval mode before exporting. I heavily modified the enjoy.py script to export and run the models.

Exporting to ONNX:

    torch_model = ALGOS[algo].load(
        model_path, custom_objects=custom_objects, device=args.device, **kwargs
    )
    torch_model.policy.eval()

    obs = env.reset()

    obs_tensor = torch_model.policy.obs_to_tensor(obs)[0]
      # Export the model
      torch.onnx.export(
          torch_model.policy,  # model being run
          obs_tensor,  # model input (or a tuple for multiple inputs)
          output_model_name,  # where to save the model (can be a file or file-like object)
          export_params=True,  # store the trained parameter weights inside the model file
          opset_version=10,  # the ONNX version to export the model to
          do_constant_folding=True,  # whether to execute constant folding for optimization
          input_names=["input"],  # the model's input names
          output_names=["output"],  # the model's output names
          dynamic_axes={
              "input": {0: "batch_size"},  # variable length axes
              "output": {0: "batch_size"},
          },
      )

Running inference using the onnx model:

    ort_session = onnxruntime.InferenceSession(onnx_model_path)
    ort_inputs = {ort_session.get_inputs()[0].name: obs}
    action = ort_session.run(None, ort_inputs)[0]
    obs, reward, done, infos = env.step(action)

The above are the only modifications done to enjoy.py in order to export and run the model. However, the results of the trained agent is not same. Am I missing something obvious here? Any help would be greatly appriciated!

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the SB3 documentation
[X] I have read the RL Zoo documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 11 months ago

Hello, could you be more specific on which algo/env you are using ?

VineetTambe commented 11 months ago

Hey,

I am using the qrdqn algo and a custom environment based on top of the minigrid env

araffin commented 11 months ago

I am using the qrdqn algo and a custom environment based on top of the minigrid env

Could you share the observation and action spaces?

You are probably missing pre-processing, see https://github.com/DLR-RM/stable-baselines3/issues/1349#issuecomment-1446161768 (we welcome a PR that updates our doc).

VineetTambe commented 11 months ago

Could you share the observation and action spaces?

Observation space: Box(0, 255, (50,), uint8)
Action Space:          Discrete(4)

You are probably missing pre-processing I tried doing what is done in the comment linked - which is create a new pytorch model class which has the the policy preprocessing step in the forward pass (please correct me if I am wrong here)

What exactly does the pre-processing entail? Is there anything more to it? Because even after doing the above step I get the same incorrect results. Is there any postprocessing step that I might be missing?

araffin commented 11 months ago

You are probably either missing image pre-processing (dividing by 255 before feeding to the network) or are not comparing to the greedy policy.

The following works and was tested comparing the quantiles returned:

import numpy as np
import torch as th
from sb3_contrib import QRDQN

model = QRDQN("MlpPolicy", "LunarLander-v2")
model.policy.to("cpu")
# Note: by default model.policy.quantile_net.forward() returns quantiles
onnxable_model = model.policy
observation_size = model.observation_space.shape[0]

dummy_input = th.randn(1, observation_size)
onnx_path = "qrdqn_model.onnx"
th.onnx.export(
    onnxable_model,
    dummy_input,
    onnx_path,
    opset_version=17,
    input_names=["input"],
)

##### Load and test with onnx

import numpy as np
import onnx
import onnxruntime as ort

onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)

# observation = np.zeros((1, observation_size)).astype(np.float32)
observation = dummy_input.cpu().numpy()
ort_sess = ort.InferenceSession(onnx_path)
action = ort_sess.run(None, {"input": observation})[0]

print(action)
print(model.predict(observation, deterministic=True)[0])

DLR-RM / rl-baselines3-zoo

[Question] exported ONNX model does not result in same output as the original pytorch model #394

❓ Question

Checklist