IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 460 forks source link

Saved frozen_graph for serving the trained model #449

Closed Boubside closed 4 years ago

Boubside commented 4 years ago

Hi,

I used the coach library to train a model for obstacle avoidance using distributed reinforcement learning in AWS Robomaker. I now want to use the model on the real robot as part of the obstacle avoidance ROS node. My issue is that I can't figure out how I can convert the checkpoint files for use with Tensorflow 2.

To be more specific, I found some scripts to convert checkpoint files to frozen graphs for serving models, but this requires the name of the output node. I'm unable to find this name, and more generally I'm not sure to understand how policy are structured inside coach and what i should do. I tried to use 'main_level/agent/main/online/network_1/ppo_head_0/policy_std' as output node name but it seems like when i use the outputted graph for prediction, I always get the same output, no matter what input I choose.

I'm using the Clipped PPO algorithm and a custom environment. Do you have any guidance on how I can retrieve the policy as a tensorflow or Keras model to use for online prediction on the real robot ?

Thanks a lot for your help, Feel free to ask for more details if necessary.

Here are some code snippets, first my preset file :

from rl_coach.agents.clipped_ppo_agent import ClippedPPOAgentParameters
from rl_coach.base_parameters import VisualizationParameters, PresetValidationParameters, DistributedCoachSynchronizationType
from rl_coach.core_types import TrainingSteps, EnvironmentEpisodes, EnvironmentSteps, RunPhase
from rl_coach.environments.gym_environment import GymVectorEnvironment
from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager
from rl_coach.graph_managers.graph_manager import ScheduleParameters
from rl_coach.schedules import LinearSchedule

from rl_coach.exploration_policies.categorical import CategoricalParameters
from rl_coach.filters.filter import NoInputFilter, NoOutputFilter, InputFilter
from rl_coach.filters.observation.observation_stacking_filter import ObservationStackingFilter
from rl_coach.filters.observation.observation_rgb_to_y_filter import ObservationRGBToYFilter
from rl_coach.filters.observation.observation_to_uint8_filter import ObservationToUInt8Filter
from rl_coach.memories.memory import MemoryGranularity

from markov.environments.base_env import BaseEnvironmentParameters

####################
# Graph Scheduling #
####################

schedule_params = ScheduleParameters()
schedule_params.improve_steps = TrainingSteps(1000000000)
schedule_params.steps_between_evaluation_periods = EnvironmentEpisodes(40)
schedule_params.evaluation_steps = EnvironmentEpisodes(5)
schedule_params.heatup_steps = EnvironmentEpisodes(10)

#########
# Agent #
#########
agent_params = ClippedPPOAgentParameters()

agent_params.network_wrappers['main'].learning_rate = 0.0003
agent_params.network_wrappers['main'].input_embedders_parameters['observation'].activation_function = 'relu'
agent_params.network_wrappers['main'].batch_size = 64
agent_params.network_wrappers['main'].optimizer_epsilon = 1e-5
agent_params.network_wrappers['main'].adam_optimizer_beta2 = 0.999

agent_params.algorithm.clip_likelihood_ratio_using_epsilon = 0.2
agent_params.algorithm.clipping_decay_schedule = LinearSchedule(1.0, 0, 100000)
agent_params.algorithm.beta_entropy = 0.01  # also try 0.001
agent_params.algorithm.gae_lambda = 0.95
agent_params.algorithm.discount = 0.999
agent_params.algorithm.optimization_epochs = 10
agent_params.algorithm.estimate_state_value_using_gae = True
agent_params.algorithm.num_steps_between_copying_online_weights_to_target = EnvironmentEpisodes(30)
agent_params.algorithm.num_consecutive_playing_steps = EnvironmentEpisodes(30)
agent_params.memory.max_size = (MemoryGranularity.Transitions, 10**5)
agent_params.algorithm.distributed_coach_synchronization_type == DistributedCoachSynchronizationType.SYNC

###############
# Environment #
###############

env_params = BaseEnvironmentParameters()
env_params.level = 'RoboMaker-oarl-v0'

vis_params = VisualizationParameters(render=False) 

########
# Test #
########
preset_validation_params = PresetValidationParameters()
preset_validation_params.test = True
preset_validation_params.min_reward_threshold = 10000
preset_validation_params.max_episodes_to_achieve_reward = 100000

graph_manager = BasicRLGraphManager(agent_params=agent_params, env_params=env_params,
                                    schedule_params=schedule_params, vis_params=vis_params,
                                    preset_validation_params=preset_validation_params)

Here is the conversion code I used :

import tensorflow as tf
import os, argparse

dir = os.path.dirname(os.path.realpath(__file__))

def freeze_graph(model_dir, output_node_names):
    """Extract the sub graph defined by the output nodes and convert
    all its variables into constant
    Args:
        model_dir: the root folder containing the checkpoint state file
        output_node_names: a string, containing all the output node's names,
                            comma separated
    """
    if not tf.gfile.Exists(model_dir):
        raise AssertionError(
            "Export directory doesn't exists. Please specify an export "
            "directory: %s" % model_dir)

    if not output_node_names:
        print("You need to supply the name of a node to --output_node_names.")
        return -1

    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_dir)
    input_checkpoint = checkpoint.model_checkpoint_path

    # We precise the file fullname of our freezed graph
    absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_dir + "/frozen_model.pb"

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True

    # We start a session using a temporary fresh Graph
    with tf.Session(graph=tf.Graph()) as sess:
        # We import the meta graph in the current default Graph
        saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

        # We restore the weights
        saver.restore(sess, input_checkpoint)

        # We use a built-in TF helper to export variables to constants
        output_graph_def = tf.graph_util.convert_variables_to_constants(
            sess, # The session is used to retrieve the weights
            tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        )

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))

    return output_graph_def

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_dir", type=str, default="./", help="Model folder to export")
    parser.add_argument("--output_node_names", type=str, default="main_level/agent/main/online/network_1/ppo_head_0/policy_log_std", help="The name of the output nodes, comma separated.")
    args = parser.parse_args()

    freeze_graph(args.model_dir, args.output_node_names)

The loading and prediction code :

import tensorflow as tf
import argparse
import numpy as np
from tensorflow.python.framework import tensor_util

def load_graph(model_filepath):
    '''
    Lode trained model.
    '''
    print('Loading model...')
    graph = tf.Graph()
    sess = tf.InteractiveSession(graph = graph)

    with tf.gfile.GFile(model_filepath, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())

    print('Check out the input placeholders:')
    nodes = [n.name + ' => ' +  n.op for n in graph_def.node if n.op in ('Placeholder')]
    for node in nodes:
        print(node)

    # Define input tensor
    input = tf.placeholder(np.float32, shape = [1,12], name='import/main_level/agent/main/online/network_0/observation/observation')

    tf.import_graph_def(graph_def, {'main_level/agent/main/online/network_0/observation/observation': input})

    print('Model loading complete!')
    # Get layer names
    layers = [op.name for op in graph.get_operations()]
    for layer in layers:
        print(layer)

     # Check out the weights of the nodes
    weight_nodes = [n for n in graph_def.node if n.op == 'Const']
    for n in weight_nodes:
        print("Name of the node - %s" % n.name)
        print("Value - " )
        print(tensor_util.MakeNdarray(n.attr['value'].tensor))

    output_tensor = graph.get_tensor_by_name("import/main_level/agent/main/online/network_1/ppo_head_0/policy_std:0")
    output = sess.run(output_tensor, feed_dict = {input: [[-1,0,1,0,1,1,1,1,1,1,1,1]]})

    print("output : " + str(output))

if __name__ == '__main__':
    # Let's allow the user to pass the filename as an argument
    parser = argparse.ArgumentParser()
    parser.add_argument("--frozen_model_filename", default="./frozen_model.pb", type=str, help="Frozen model file to import")
    args = parser.parse_args()

    # We use our "load_graph" function
    graph = load_graph(args.frozen_model_filename)
Boubside commented 4 years ago

By doing a little more research I found issues #71 and #374 that pretty mush talk about what I want to do. I'd like to use my trained model for inference in a production environment. Is there a way to directly use the model for inference without using coach framework. I'm using a custom Gazebo environment for training and need to deploy on the real robot.

Computing power is limited and I'd like to avoid loading unnecessary parts of the GraphManager (especially the environment). Using TF Serving also seems a too heavy for my application. So maybe I can use the model as is for inference or define a GraphManager with no environment ?

ReHoss commented 4 years ago

Hi I want to switch from Clipped PPO to a new algorithm hence I am actually understanding a bit of the coach source code. Have you tried main_level/agent/main/online/network_1/ppo_head_0/policy ?

Boubside commented 4 years ago

@ReHoss Yes i tried, but it tells me that it's not a graph... I should probably precise that I've been using coach version 0.11.1 for training.

I've been able to use solution of issu #374 successfully in version 1.0.0. So my best guess is that i'll use this approach with a dummy environment. I just need to make my Robomaker code compatible with 1.0.0 for training.

Boubside commented 4 years ago

I close the issue as the new one i've posted (#450) is more relevant to the evolution of my issue.