Option to export trained models as ONNX

yaelatletl commented 1 year ago

So far this code works but shows some warnings, I'm not sure if these can be ignored or should be resolved

Resolves #23

edbeeching commented 1 year ago

We can definitely merge the PR when it is finished. I tried it out in Godot 3.5.1 mono and it was not clear how to load the model file. Is this still WIP?

yaelatletl commented 1 year ago

Yes, right now it just loads the model, but does nothing with it, I have yet to implement that part. There is a scene named "Test.tscn" If you run that one you'll load the model and create an inference session.

yaelatletl commented 1 year ago

Doing some clean up, got the model to run and produce an output

edbeeching commented 1 year ago

This is really cool, I tried out the PR on my machine, but I am on Linux and I don't think I have the right ONNX runtime. I am not so familiar with using C# in Godot. Do you have any advice about installing the dependencies?

yaelatletl commented 1 year ago

@edbeeching I just pushed some instructions in the "Installation" docs

yaelatletl commented 1 year ago

Most recent change, now it detects and uses any GPU it finds for the Inference Session. If it's not compatible, it just uses the CPU.

edbeeching commented 1 year ago

Nice, there seems to be some duplication of files (ONNXInference and ONNXWrapper) inside JumperHard and the addon directory. I deleted the ones not in the addon dir to get the Test scene to build. I now have an exception related to SessionConfigurator.SystemCheck();, I will try and look into this more tonight.

On another note, would it be interesting to discuss at some point what features you think are missing from this lib? Perhaps I / we can implement them next.

yaelatletl commented 1 year ago

I found out that trying to run the code in my laptop with an NVidia GPU gives out the exception you mention, but I have no problems using AMD. I'm investigating this issue. I think we could gamify a bit the examples, right now they feel too monolithic and not reusable, so we could establish some basic scenes and scripts and remake the examples using them. I made something like that with the JumperHard example in another repo, and was waiting on getting the runtime implemented to suggest it.

edbeeching commented 1 year ago

I took me a moment to get it working on Linux (Ubuntu 20.04), in the end I only got inference working on CPU. I had to manually add the lib to the .csproj file with:

   <ItemGroup>
    <None Include="libonnxruntime.so" />
    <ContentWithTargetPath Include="libonnxruntime.so">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
      <TargetPath>libonnxruntime.so</TargetPath>
    </ContentWithTargetPath>
  </ItemGroup>

In SessionConfigurator.s I had to comment out line 51:

//options.AppendExecutionProvider_CUDA(0);

Otherwise, I have the following error:

E 0:00:00.274   System.EntryPointNotFoundException: OrtSessionOptionsAppendExecutionProvider_CUDA assembly:<unknown assembly> type:<unknown type> member:(null)
  <C++ Error>   Unhandled exception
  <C++ Source>  :0
  <Stack Trace> :0 @ ()
                :0 @ void Microsoft.ML.OnnxRuntime.SessionOptions.AppendExecutionProvider_CUDA(Int32 )()
                SessionConfigurator.cs:51 @ Int32 GodotONNX.SessionConfigurator.SystemCheck()()
                ONNXInference.cs:22 @ void GodotONNX.ONNXInference.Initialize(System.String , Int32 )()

This may be because I have a RTX 4090 and the OnnxRuntime does not support this yet?

Regarding the model, is this a trained policy? I will try and get it running the agent over the holiday.

yaelatletl commented 1 year ago

Did you try the fix I pushed? changed the csproj to include a target platform, that got the program to run. It's not the GPU, seems that as I'm trying to let the SessionConfigurator.cs pick which execution provider to use, there's a bit of ambiguiety that prevents it from loading the correct library. I'm currently working on this.

For the exported models, these are the trained checkpoints available as examples, restored and then converted to ONNX, so you can find a model.onnx inside each checkpoint folder

yaelatletl commented 1 year ago

I've been searching for an answer on this particular issue, from getting all dependencies (seems you need to install CUDA 11.6 with cuDNN 8.5.0 [windows] or 8.2.4 [Linux] too) to trying to manually assign the dll. So far the best I could get was to comment the CUDA line as well.

DirectML works fine, either on AMD or NVDIA.

yaelatletl commented 1 year ago

As I see the repo has moved to 4.x support, I think this should be merged into a 3.X branch, from there, I'd backport features from 4.X to 3.X and update parts of this PR into a new PR for 4.X

edbeeching commented 1 year ago

Hey, sorry for the slow reply. I was off for the holidays. I have created a branch called godot3.5, if you can update the PR to be on that branch we can look to merge this week. Going forward, would you like to have collaborator access to this repo, so you can make your changes directly on branches here?

yaelatletl commented 1 year ago

No worries! I hope you had happy holidays. I made the necessary changes to the PR, and I'd appreciate a lot the access! I would still create some issues or PRs for discussion if needed.

yaelatletl commented 1 year ago

@edbeeching is it ok if I merge this? As the next step for the 3.5 branch is backporting the python code, some changes here might get overriden, but that's ok, so I could procede to setup the export process for the current python code

edbeeching commented 1 year ago

Hey, sure. just fix the merge conflict in .gitignore and it is good to go.

yaelatletl commented 1 year ago

Merged! 🎉

visuallization commented 1 year ago

@yaelatletl this looks cool! Will it work in the end like in unity ml agents so that you can just plug in the onnx file into the agent?

visuallization commented 1 year ago

Nice, there seems to be some duplication of files (ONNXInference and ONNXWrapper) inside JumperHard and the addon directory. I deleted the ones not in the addon dir to get the Test scene to build. I now have an exception related to SessionConfigurator.SystemCheck();, I will try and look into this more tonight.

On another note, would it be interesting to discuss at some point what features you think are missing from this lib? Perhaps I / we can implement them next.

@edbeeching I just scrolled through this PR and read the discussion. Regarding the question what could be improved, a few things come into my mind which have the potential to improve the whole thing a lot.

Provide a detailed and clean documentation, on how to use things. I think that's one of the big features which decides if people will use a tool or not. Make it as easy as possible for new people to get started. Especially since RL is already hard enough. Currently the framework is still a lot of trial and error and some things only work on certain platforms without being documented.
Provide a step by step video (+ text) tutorial on how to setup and train an agnet with godo rl agents from scratch. I think this will help newcomers immensely to get started.
Make sure the provided releases actually work. It took me quiet some time until I realized that the senord3d is not working in Releaes 0.2.2. I guess it makes sense to create a new release and remove the broken ones.
It would be cool, if we GDScript would provide interfaces so it gets really clear which methods and members the agent has to implement. I know GDScript does not support this, but maybe there is an alternative?

Cheers, hope this helps!

edbeeching commented 1 year ago

@visuallization , thanks for highlighting this. I am aware of the poor quality of the docs. I have been focused on features and my time is split across a number of projects. For the Hugging Face Deep RL course, I am making a more detailed tutorial, which will be released in a couple of weeks. I will aim to add that to the docs as well.

I have created an issue #59 to track these improvements for the next version.

yaelatletl commented 1 year ago

@yaelatletl this looks cool! Will it work in the end like in unity ml agents so that you can just plug in the onnx file into the agent?

Hey, thanks! Yes, that's the plan, just loading the exported file and be good to go.

GeorgeS2019 commented 1 year ago

@yaelatletl

Do you have a working exported Onnx and working within Godot-RL Agent?

edbeeching / godot_rl_agents

Option to export trained models as ONNX #25