Adds onnx export to cleanrl example

Adds support for exporting the trained model to onnx using the command-line argument --onnx_export_path.

It works with a single observation space, and here is how the output onnx looks in Netron, compared to a file exported using the sb3 example script:

I assume the difference in flattening is due to the CleanRL export using a single observation space.

Quick test using: python clean_rl_example.py --total-timesteps=50_000 --onnx_export_path=model_clean_rl.onnx

https://github.com/edbeeching/godot_rl_agents/assets/61947090/dd306de3-5b9d-4745-b2bd-fa19a4086009

The additional code for exporting is implemented in the same file, however it's still using a separate class similar to the SB3 export, so that the original agent class is not modified. The new class is added to implement the forward method, which in this case just returns the mean action and state_ins to keep compatibility with our onnx inference code.

Note: Even though the implementation seems to work after a quick test, as I'm not very familiar with Torch/Numpy/Neural nets, please check the implementation in case I overlooked something.

edbeeching / godot_rl_agents

Adds onnx export to cleanrl example #161