Open VineetTambe opened 11 months ago
Hello, could you be more specific on which algo/env you are using ?
Hey,
I am using the qrdqn algo and a custom environment based on top of the minigrid env
I am using the qrdqn algo and a custom environment based on top of the minigrid env
Could you share the observation and action spaces?
You are probably missing pre-processing, see https://github.com/DLR-RM/stable-baselines3/issues/1349#issuecomment-1446161768 (we welcome a PR that updates our doc).
Could you share the observation and action spaces?
Observation space: Box(0, 255, (50,), uint8)
Action Space: Discrete(4)
You are probably missing pre-processing I tried doing what is done in the comment linked - which is create a new pytorch model class which has the the policy preprocessing step in the forward pass (please correct me if I am wrong here)
What exactly does the pre-processing entail? Is there anything more to it? Because even after doing the above step I get the same incorrect results. Is there any postprocessing step that I might be missing?
You are probably either missing image pre-processing (dividing by 255 before feeding to the network) or are not comparing to the greedy policy.
The following works and was tested comparing the quantiles returned:
import numpy as np
import torch as th
from sb3_contrib import QRDQN
model = QRDQN("MlpPolicy", "LunarLander-v2")
model.policy.to("cpu")
# Note: by default model.policy.quantile_net.forward() returns quantiles
onnxable_model = model.policy
observation_size = model.observation_space.shape[0]
dummy_input = th.randn(1, observation_size)
onnx_path = "qrdqn_model.onnx"
th.onnx.export(
onnxable_model,
dummy_input,
onnx_path,
opset_version=17,
input_names=["input"],
)
##### Load and test with onnx
import numpy as np
import onnx
import onnxruntime as ort
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)
# observation = np.zeros((1, observation_size)).astype(np.float32)
observation = dummy_input.cpu().numpy()
ort_sess = ort.InferenceSession(onnx_path)
action = ort_sess.run(None, {"input": observation})[0]
print(action)
print(model.predict(observation, deterministic=True)[0])
❓ Question
I am trying to export the trained pytorch model to onnx so that I can deploy it. But I am facing some issues where the output of the exported model is not the same as the pytorch model when I run a episode. I have made sure that I set the model to eval mode before exporting. I heavily modified the enjoy.py script to export and run the models.
Exporting to ONNX:
Running inference using the onnx model:
The above are the only modifications done to enjoy.py in order to export and run the model. However, the results of the trained agent is not same. Am I missing something obvious here? Any help would be greatly appriciated!
Checklist