Closed Boubside closed 4 years ago
By doing a little more research I found issues #71 and #374 that pretty mush talk about what I want to do. I'd like to use my trained model for inference in a production environment. Is there a way to directly use the model for inference without using coach framework. I'm using a custom Gazebo environment for training and need to deploy on the real robot.
Computing power is limited and I'd like to avoid loading unnecessary parts of the GraphManager (especially the environment). Using TF Serving also seems a too heavy for my application. So maybe I can use the model as is for inference or define a GraphManager with no environment ?
Hi I want to switch from Clipped PPO to a new algorithm hence I am actually understanding a bit of the coach source code. Have you tried main_level/agent/main/online/network_1/ppo_head_0/policy ?
@ReHoss Yes i tried, but it tells me that it's not a graph... I should probably precise that I've been using coach version 0.11.1 for training.
I've been able to use solution of issu #374 successfully in version 1.0.0. So my best guess is that i'll use this approach with a dummy environment. I just need to make my Robomaker code compatible with 1.0.0 for training.
I close the issue as the new one i've posted (#450) is more relevant to the evolution of my issue.
Hi,
I used the coach library to train a model for obstacle avoidance using distributed reinforcement learning in AWS Robomaker. I now want to use the model on the real robot as part of the obstacle avoidance ROS node. My issue is that I can't figure out how I can convert the checkpoint files for use with Tensorflow 2.
To be more specific, I found some scripts to convert checkpoint files to frozen graphs for serving models, but this requires the name of the output node. I'm unable to find this name, and more generally I'm not sure to understand how policy are structured inside coach and what i should do. I tried to use 'main_level/agent/main/online/network_1/ppo_head_0/policy_std' as output node name but it seems like when i use the outputted graph for prediction, I always get the same output, no matter what input I choose.
I'm using the Clipped PPO algorithm and a custom environment. Do you have any guidance on how I can retrieve the policy as a tensorflow or Keras model to use for online prediction on the real robot ?
Thanks a lot for your help, Feel free to ask for more details if necessary.
Here are some code snippets, first my preset file :
Here is the conversion code I used :
The loading and prediction code :