Denys88 / rl_games

RL implementations
MIT License
820 stars 138 forks source link

Exporting Dofbot Policy to Onnx From Omni Isaac Gym #226

Closed DJT777 closed 1 year ago

DJT777 commented 1 year ago

Hi,

I'm working on exporting a policy trained on the example from this repo https://github.com/j3soon/OmniIsaacGymEnvs-DofbotReacher.

I've based my export on the example you gave from your code in the notebook here: https://github.com/Denys88/rl_games/blob/master/notebooks/train_and_export_onnx_example_continuous.ipynb

I think that these lines of code should be sufficient as the config for the model is the exact same kind of model: https://github.com/j3soon/OmniIsaacGymEnvs-DofbotReacher/blob/main/omniisaacgymenvs/cfg/train/DofbotReacherPPO.yaml

To export my model incorporated your code into the rlgames_train.py from that repo as rlgames_export.py, with my alterations on line 48-124: https://github.com/DJT777/Excolligere/blob/main/omniisaacgymenvs/scripts/rlgames_export.py. You can see I am restoring the default checkpoint as the weights.

This is exporting a runnable ONNX model, and I am trying to use it here: https://github.com/DJT777/Excolligere/blob/main/notebooks/Load%20ONNX%20and%20Predict.ipynb

However, I am not getting the same results as the simulator running. I am wondering if you can help me understand if I am exporting correctly in my script for rlgames_export.py, and if I am performing inference with the ONNX model correctly here.

Denys88 commented 1 year ago

HI, could yo you give me an access to the repo?

DJT777 commented 1 year ago

I've invited you to have access to the repo.

Denys88 commented 1 year ago

From the first view your code is good. Could you check observation normalization? It should be inside of the model, but earlier I had it inside of the training code. Could you confirm that you re using the latest rlgames version?

DJT777 commented 1 year ago

This project is kind of my first journey into RL, so I'm kind of new to this terminology. Excuse me if I'm not correct on answering your questions.

Here is the Netron view of the model:

dofbotreacherDefault onnx

I suppose the observation normalization would be the Sub and Div operations shown in Netron as well as clip and flatten.

The version of RLGames is what is being used in Isaac Sim 2022.1.1, which is showing version 1.5.2 from the conda environment given with Isaac Sim.

DJT777 commented 1 year ago

If this is helpful for debugging:

What I'm doing is printing out the placement tensor found here: https://github.com/DJT777/Excolligere/blob/main/omniisaacgymenvs/tasks/dofbot_reacher.py On lines 144-151

These coordinates are then plugged in to the target position for the inference in the notebook.

I'm also printing out my actions from the step function here on the first line after the function definition: https://github.com/DJT777/Excolligere/blob/71936c837db36a4930e40320d1575aaccebc2398/omniisaacgymenvs/envs/vec_env_rlgames.py

Then I'm comparing the outputs from Isaac Sim and the notebook I linked above.

One consideration I've had is that I'm not reading the output correctly based on this documentation suggesting that there is a different ordering of the outputs: https://github.com/DJT777/Excolligere/blob/71936c837db36a4930e40320d1575aaccebc2398/docs/transfering_policies_from_isaac_gym.md

Denys88 commented 1 year ago

The best case if you pass the same input to the bothh onnx and pytorch model and compare the outputs.

DJT777 commented 1 year ago

Here is a sample of the different outputs as you suggested:

From the Isaac Gym:


Observation:tensor([[-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000]], device='cuda:0')
action before clip tensor([[-0.1526,  0.3710, -0.6942,  0.6228, -1.0000, -0.1611]],
       device='cuda:0')

Whereas running this code:


import numpy as np

# Create the individual arrays
jointPos = np.zeros(6)
jointVel = np.zeros(6)
goalPos = np.array([.1826, -.1787, .2417])
goalRot = np.zeros(4)
goalRotRel = np.zeros(4)
prevAct = np.full(6, 0)

# Define a function to clip the actions
def clip_actions(actions):
    return np.clip(actions, -1.0, 1.0)

    # Concatenate the arrays into a single array
input_data = np.array([-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}

while True:
    #input_data = np.concatenate((jointPos, jointVel, goalPos, goalRot, goalRotRel, prevAct), dtype=np.float32)
    #print(str(input_data))
    input_dict = {input_name: input_data.reshape(1, -1)}
    actions = []
    output = sess.run(None, input_dict)
    # Run the model on the input data
    mus = output[0][0]
    sigmas = output[1][0]
    for mu, sigma in zip(mus, sigmas):
        #print("mu is " + str(mu))
        #print("sigma is " + str(sigma))
        sigma = np.exp(sigma)
        action = np.random.normal(mu, sigma)
        action = clip_actions(action)
        #print(str(action))
        actions.append(action)
        # Print the output
    prevAct = np.array(actions, dtype=np.float32)
    joinPos = prevAct
    mus = clip_actions(mus)
    #print("From Mu: " + str(mus))
    print("Sampled Actions: " + str(actions)+"\n")

produces very different results in the action output:


Sampled Actions: [1.0, -0.44891554912899617, 0.4055521331764843, 1.0, 0.044669989955658046, 0.6681064546999487]

Sampled Actions: [1.0, -0.6341698266623044, 0.42099329096228416, 1.0, -0.03949862364674661, 0.5454127401465692]

Sampled Actions: [1.0, -0.9018518161188407, 0.43396113195718267, 0.8811261401993116, -0.1814963516831179, 0.6290464824858719]

Sampled Actions: [1.0, -0.5114059775648212, 0.21491532992916937, 1.0, 0.22194688542268481, 0.8891714238151054]

Sampled Actions: [1.0, -0.555514435619405, 0.32891076544248093, 1.0, 0.2197003265248571, 0.5689791679823528]
Denys88 commented 1 year ago

Could you print only 'mu' outpus from both models. Because it is expected behavior if you are sampling.

DJT777 commented 1 year ago

I've not been able to find in Nvidia's Isaac Gym's code where they are doing their actual inference and getting predictions. A lot of their RL code is based on your RL Games repo. Would you know where to look?

Denys88 commented 1 year ago

I mean you don't need sampling action = np.random.normal(mu, sigma) if you want to have a deterministic policy you can jsut make action = mu.

DJT777 commented 1 year ago

Right, but I want to verify that I'm getting the correct and same predictions across the simulator and the ONNX model. My experiments with deploying the ONNX model lead me to believe that I was getting inaccurate predictions from the ONNX model. This is because the reaches were not matching what they were in simulation.

DJT777 commented 1 year ago

You can see here that even with the deterministic approach that the expected value is vastly different from what actions are actually being taken in the simulator:

Case 1:

Simulator:

new position tensor([[ 0.1826, -0.1787,  0.2417]], device='cuda:0')
Observation:tensor([[-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000]], device='cuda:0')
action before clip tensor([[-0.1526,  0.3710, -0.6942,  0.6228, -1.0000, -0.1611]],
       device='cuda:0')

ONNX


input_data = np.array([-0.0059,  0.0736, -0.0185,  0.0088, -0.0200, -0.0931,  0.0462, -0.1982,
          0.1095, -0.0083,  0.7498,  0.0108,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6569,  0.2588,  0.2059, -0.6776,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)

From Mu: [ 1.         -0.47443828  0.32300434  1.          0.15414204  0.65032417]

Case 2:

Simulator

Observation:tensor([[-0.0297,  0.1214, -0.1320,  0.1025, -0.1395, -0.0864, -0.4589,  0.4480,
         -0.9389,  0.8410, -2.5514,  0.0103,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6169,  0.3250,  0.2644, -0.6662, -0.1526,
          0.3710, -0.6942,  0.6228, -1.0000,  0.0000]], device='cuda:0')
action before clip tensor([[ 0.5072,  0.2407,  0.3249,  1.0000, -1.0000, -0.3381]],
       device='cuda:0')

ONNX

input_data = np.array([-0.0297,  0.1214, -0.1320,  0.1025, -0.1395, -0.0864, -0.4589,  0.4480,
         -0.9389,  0.8410, -2.5514,  0.0103,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.6169,  0.3250,  0.2644, -0.6662, -0.1526,
          0.3710, -0.6942,  0.6228, -1.0000,  0.0000], dtype=np.float32)

From Mu: [ 0.56676865 -1.         -0.08646691  1.          0.14331801  0.48769447]

Case 3

From Simulation

observation:tensor([[ 0.0510,  0.1173, -0.1063,  0.2193, -0.3571, -0.0850,  1.4090,  0.1383,
          0.7315,  0.5217, -3.8810,  0.0157,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.5332,  0.3558,  0.2370, -0.7300,  0.5072,
          0.2407,  0.3249,  1.0000, -1.0000,  0.0000]], device='cuda:0')
action before clip tensor([[ 0.1546,  0.7010,  0.4620,  0.8540, -0.6394, -0.5318]],
       device='cuda:0')

From ONNX


input_data = np.array([0.0510,  0.1173, -0.1063,  0.2193, -0.3571, -0.0850,  1.4090,  0.1383,
          0.7315,  0.5217, -3.8810,  0.0157,  0.1826, -0.1787,  0.2417,  0.0152,
          0.3937, -0.0355, -0.9184,  0.5332,  0.3558,  0.2370, -0.7300,  0.5072,
          0.2407,  0.3249,  1.0000, -1.0000,  0.0000], dtype=np.float32)
From Mu: [1.         0.59151983 1.         1.         0.71608543 0.70238996]```
Denys88 commented 1 year ago

could you check that pythorch model returns determinisitc outputs too?

DJT777 commented 1 year ago

I've no idea where in the codebase they are actually generating the outputs of the model. I'm wondering if you would lend your expertise in helping me find in Nvidia's implementation of your rlgames library to find where they are making predictions and outputs.

Denys88 commented 1 year ago

I mean you can call my model twice with the same input.

DJT777 commented 1 year ago

Do you mean use your library to generate the model then compare outputs?

Why not identify where in their code they are generating outputs of their model. Then print/store the outputs and the observation buffer and compare with the ONNX notebook? Might also help to see how they are generating the actions from the output of the model.

Denys88 commented 1 year ago

you can call: agent.get_action(obs, True) And make sure it is the same checkpoint what was exported to the onnx.

DJT777 commented 1 year ago

Okay, going to try that.

DJT777 commented 1 year ago

Here is the code I've used (it's also pushed to the Excolligere repo you have access to)

These should have equal output on the same data, correct? All of the mus, logstd, and rewards are different.

Omni Isaac


        agent = runner.create_player()
        agent.restore('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/runs/DofbotReacher/nn/DofbotReacher.pth')
        input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent: " + str(agent.get_action(obs, True)))

Here is the output:


Actions from agent: tensor([ 0.1547,  0.7007,  0.4620,  0.8539, -0.6395, -0.5317], device='cuda:0')
(tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))

Whereas this is the output from the notebook:

Notebook With ONNX

import numpy as np

# Create the individual arrays
jointPos = np.zeros(6)
jointVel = np.zeros(6)
goalPos = np.array([.1826, -.1787, .2417])
goalRot = np.zeros(4)
goalRotRel = np.zeros(4)
prevAct = np.full(6, 0)

# Define a function to clip the actions
def clip_actions(actions):
    return np.clip(actions, -1.0, 1.0)

    # Concatenate the arrays into a single array
input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}

while True:
    #input_data = np.concatenate((jointPos, jointVel, goalPos, goalRot, goalRotRel, prevAct), dtype=np.float32)
    #print(str(input_data))
    input_dict = {input_name: input_data.reshape(1, -1)}
    actions = []
    output = sess.run(None, input_dict)
    print(output)
    # Run the model on the input data
    #mus = output[0][0]
    #sigmas = output[1][0]
    #for mu, sigma in zip(mus, sigmas):
        #print("mu is " + str(mu))
        #print("sigma is " + str(sigma))
        #sigma = np.exp(sigma)
        #action = np.random.normal(mu, sigma)
        #action = clip_actions(action)
        #print(str(action))
        #actions.append(action)
        # Print the output
    #prevAct = np.array(actions, dtype=np.float32)
    #oinPos = prevAct
    #mus = clip_actions(mus)
    #print("From Mu: " + str(mus))
    #print("Sampled Actions: " + str(actions)+"\n")

Notebook output:

[array([[1.6248189 , 0.59151983, 2.016268  , 1.9443805 , 0.71608543,
        0.70238996]], dtype=float32), array([[-2.7289395, -2.1363962, -2.010333 , -2.3337207, -1.4635202,
        -1.023427 ]], dtype=float32), array([[0.5652896]], dtype=float32)]

You can see I'm using the same checkpoint and model.

Full run function in OmniIsaac

    def run(self):

        # create runner and set the settings
        runner = Runner(RLGPUAlgoObserver())
        runner.load(self.rlg_config_dict)
        runner.reset()

        # dump config dict
        experiment_dir = os.path.join('runs', self.cfg.train.params.config.name)
        os.makedirs(experiment_dir, exist_ok=True)
        with open(os.path.join(experiment_dir, 'config.yaml'), 'w') as f:
            f.write(OmegaConf.to_yaml(self.cfg))

        agent = runner.create_player()
        agent.restore('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/runs/DofbotReacher/nn/DofbotReacher.pth')
        input_data = np.array([0.0510, 0.1173, -0.1063, 0.2193, -0.3571, -0.0850, 1.4090, 0.1383,
                               0.7315, 0.5217, -3.8810, 0.0157, 0.1826, -0.1787, 0.2417, 0.0152,
                               0.3937, -0.0355, -0.9184, 0.5332, 0.3558, 0.2370, -0.7300, 0.5072,
                               0.2407, 0.3249, 1.0000, -1.0000, 0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent" + str(agent.get_action(obs, True)))

        inputs = {
            'obs': torch.zeros((1,) + agent.obs_shape).to(agent.device),
        }

        with torch.no_grad():
            adapter = flatten.TracingAdapter(ModelWrapper(agent.model), inputs, allow_non_tensor=True)
            traced = torch.jit.trace(adapter, adapter.flattened_inputs, check_trace=False)
            flattened_outputs = traced(*adapter.flattened_inputs)
            print(flattened_outputs)

        torch.onnx.export(traced, *adapter.flattened_inputs, "dofbotreacherDefault.onnx", verbose=True, input_names=['obs'],
                          output_names=['mu', 'log_std', 'value'])

        runner.run({
            'train': not self.cfg.test,
            'play': self.cfg.test,
            'checkpoint': self.cfg.checkpoint,
            'sigma': None
        })

Notebook loading the model


import onnxruntime as ort
import numpy as np

# Load the ONNX model
sess = ort.InferenceSession('/home/dylan/Desktop/repos/OmniIsaacGymEnvs-DofbotReacher/dofbotreacherDefault.onnx')

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

# Get the input names and shapes
input_info = sess.get_inputs()
output_info = sess.get_outputs()

for i in input_info:
    print("Input name:", i.name)
    print("Input shape:", i.shape)

for i in output_info:
    print("Output name:", i.name)
    print("Output shape:", i.shape)
DJT777 commented 1 year ago

Actually, reloading my model has now generated similar results:

Isaac Gym

Actions from agent: tensor([ 0.1547,  0.7007,  0.4620,  0.8539, -0.6395, -0.5317], device='cuda:0')
(**tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))**

ONNX


[array([[ 0.15400116,  0.7014214 ,  0.4610022 ,  0.85426027, -0.63877946,
        -0.53179383]], dtype=float32), array([[-2.3402638, -1.9764799, -1.5347915, -1.6698207, -1.4579096,
        -0.8412483]], dtype=float32), array([[-0.01199287]], dtype=float32)]
DJT777 commented 1 year ago

More data confirms successful exporting of models:

(note that results are being clipped to -1 and 1 for all results in Isaac Sim whereas the ONNX model's outputs have not yet been clipped in this post)

From Isaac Sim:

**input**

input_data = np.array([0.1639,  0.1001,  0.1125, -0.1421, -0.0237, -0.0309, -0.5712, -0.2141,
         -0.2014,  0.3321,  0.8778,  0.0034,  0.1712, -0.1732,  0.2197,  0.1061,
          0.3119, -0.3040, -0.8939,  0.3324,  0.6025, -0.5092, -0.5170,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
        obs = torch.from_numpy(input_data).unsqueeze(0).to('cuda:0')
        #input_dict = {input_name: input_data.reshape(1, -1)}
        print("Actions from agent: " + str(agent.get_action(obs, True)))
**Output**

Actions from agent: tensor([-0.1589,  0.3981, -1.0000,  0.7192, -1.0000, -0.0660], device='cuda:0')
(tensor([[-0.5273,  0.1546, -0.9393,  0.7021, -0.5191, -0.9458]],
       device='cuda:0'), tensor([[-2.3403, -1.9765, -1.5348, -1.6698, -1.4579, -0.8412]],
       device='cuda:0'), tensor([[9.6810]], device='cuda:0'))

From Notebook

**input**
    # Concatenate the arrays into a single array
input_data = np.array([0.1639,  0.1001,  0.1125, -0.1421, -0.0237, -0.0309, -0.5712, -0.2141,
         -0.2014,  0.3321,  0.8778,  0.0034,  0.1712, -0.1732,  0.2197,  0.1061,
          0.3119, -0.3040, -0.8939,  0.3324,  0.6025, -0.5092, -0.5170,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000], dtype=np.float32)
input_dict = {input_name: input_data.reshape(1, -1)}

input_dict = {input_name: input_data.reshape(1, -1)}
actions = []
output = sess.run(None, input_dict)
print(output)
**Output**

[array([[-0.15905663,  0.3985302 , -1.0090353 ,  0.7184627 , -1.4696665 ,
        -0.06618351]], dtype=float32), array([[-2.3402638, -1.9764799, -1.5347915, -1.6698207, -1.4579096,
        -0.8412483]], dtype=float32), array([[1.3237184]], dtype=float32)]
DJT777 commented 1 year ago

Closing issue

DannyChen1994 commented 7 months ago

I refer to the example in https://github.com/Denys88/rl_games/blob/master/notebooks/train_and_export_onnx_example_lstm_continuous.ipynb run and export the onnx model. But when I tried to modify the config file to modify the network structure, I found that the observation quantity of the LSTM layer was always 3 and could not be modified, and the number of actions output by the network could not be modified. I guess it is related to 'env_config': {'env_name': 'Pendulum-v1', 'seed': 5}, 'env_name': 'envpool' is related to the content. How to set up my own environment and modify the number of observations and actions?

Looking forward to your reply, Thank you!