Closed rishanshah closed 3 years ago
@rishanshah Hey can you share your train_coach.py? I would suggest to import tensorflow as tf on top of the file rather than import it in a specific class.
@haohanchen-yagao here is my train_coach.py:
from sagemaker_rl.coach_launcher import SageMakerCoachPresetLauncher
class MyLauncher(SageMakerCoachPresetLauncher):
def default_preset_name(self):
"""This points to a .py file that configures everything about the RL job.
It can be overridden at runtime by specifying the RLCOACH_PRESET hyperparameter.
"""
return 'presetsDDPG'
def map_hyperparameter(self, name, value):
"""Here we configure some shortcut names for hyperparameters that we expect to use frequently.
Essentially anything in the preset file can be overridden through a hyperparameter with a name
like "rl.agent_params.algorithm.etc".
"""
# maps from alias (key) to fully qualified coach parameter (value)
mapping = {
"discount": "rl.agent_params.algorithm.discount",
"evaluation_episodes": "rl.evaluation_steps:EnvironmentEpisodes",
"improve_steps": "rl.improve_steps:TrainingSteps"
}
if name in mapping:
self.apply_hyperparameter(mapping[name], value)
else:
super().map_hyperparameter(name, value)
if __name__ == '__main__':
MyLauncher.train_main()
@rishanshah Thanks Do you mind sharing your source code and your model ckpt and metagraphdef with us so that we can know what the input and output tensors are?
@haohanchen-yagao, I am not sure whether I will be able to share the whole code with you but here are some things that will help.
Here is my ddpg presets and graph defintions:
from rl_coach.agents.ddpg_agent import DDPGAgentParameters
from rl_coach.base_parameters import VisualizationParameters
from rl_coach.core_types import TrainingSteps, EnvironmentEpisodes, EnvironmentSteps
from rl_coach.graph_managers.basic_rl_graph_manager import BasicRLGraphManager
from rl_coach.graph_managers.graph_manager import ScheduleParameters
from rl_coach.environments.gym_environment import GymVectorEnvironment
from rl_coach.architectures.layers import Dense
from rl_coach.base_parameters import EmbedderScheme
from rl_coach.filters.filter import InputFilter
from rl_coach.filters.reward.reward_rescale_filter import RewardRescaleFilter
####################
# Graph Scheduling #
####################
schedule_params = ScheduleParameters()
schedule_params.improve_steps = TrainingSteps(10)
schedule_params.steps_between_evaluation_periods = EnvironmentEpisodes(20)
schedule_params.evaluation_steps = EnvironmentEpisodes(1)
schedule_params.heatup_steps = EnvironmentSteps(1000)
#########
# Agent #
#########
agent_params = DDPGAgentParameters()
agent_params.network_wrappers['actor'].input_embedders_parameters['observation'].scheme = [Dense(400)]
agent_params.network_wrappers['actor'].middleware_parameters.scheme = [Dense(300)]
agent_params.network_wrappers['critic'].input_embedders_parameters['observation'].scheme = [Dense(400)]
agent_params.network_wrappers['critic'].middleware_parameters.scheme = [Dense(300)]
agent_params.network_wrappers['critic'].input_embedders_parameters['action'].scheme = EmbedderScheme.Empty
# agent_params = DDPGAgentParameters()
# agent_params.algorithm.num_consecutive_playing_steps = EnvironmentSteps(20)
# agent_params = DDPGAgentParameters()
# agent_params.network_wrappers['actor'].input_embedders_parameters['measurements'] = \
# agent_params.network_wrappers['actor'].input_embedders_parameters.pop('observation')
# agent_params.network_wrappers['critic'].input_embedders_parameters['measurements'] = \
# agent_params.network_wrappers['critic'].input_embedders_parameters.pop('observation')
# agent_params.network_wrappers['actor'].input_embedders_parameters['measurements'].scheme = [Dense(300)]
# agent_params.network_wrappers['actor'].middleware_parameters.scheme = [Dense(200)]
# agent_params.network_wrappers['critic'].input_embedders_parameters['measurements'].scheme = [Dense(400)]
# agent_params.network_wrappers['critic'].middleware_parameters.scheme = [Dense(300)]
# agent_params.network_wrappers['critic'].input_embedders_parameters['action'].scheme = EmbedderScheme.Empty
# agent_params.input_filter = InputFilter()
# agent_params.input_filter.add_reward_filter("rescale", RewardRescaleFilter(1/100.))
agent_params.algorithm.action_penalty = 10.5
# agent_params.algorithm.use_non_zero_discount_for_terminal_states = True
# agent_params.algorithm.clip_critic_targets = (-10.,10.)
agent_params.network_wrappers['actor'].learning_rate = 0.01
agent_params.network_wrappers['critic'].learning_rate = 0.01
agent_params.algorithm.discount = 0.99
###############
# Environment #
###############
# env_params = GymVectorEnvironment(level='envV2Continous:ArrivalSim')
# env_params.additional_simulator_parameters = {'price': 30.0 }
env_params = GymVectorEnvironment(level='continousEnv:Environment')
graph_manager = BasicRLGraphManager(agent_params=agent_params, env_params=env_params,
schedule_params=schedule_params, vis_params=VisualizationParameters()
The invocation:
estimator = RLEstimator(entry_point="train_Coach.py",
source_dir='.',
dependencies=["sagemaker_rl"],
toolkit=RLToolkit.COACH,
toolkit_version='0.11.0',
framework=RLFramework.TENSORFLOW,
role=role,
# train_instance_type="ml.m4.4xlarge",
train_instance_type='local',
train_instance_count=1,
output_path=s3_output_path,
base_job_name="ddpgtest",
hyperparameters = {
"RLCOACH_PRESET": "presetsDDPG",
# "discount": 0.9,
# "evaluation_episodes": 8,
# "improve_steps": 1000,
"save_model": 1
}
)
estimator.fit(wait=True)
With regards to the ckpt. I was using the sagemaker default bucket to test whether I can get the model to behave the way I wanted it. I am not sure how I can get the ckpt file from there.
This is the environment on which I am training the model continousEnv.py
:
from scipy.stats import poisson, bernoulli
import math
import numpy as np
import gym
from gym.spaces import Discrete,Box
import sys
DAYS_OVERALL = 30
INITIAL_PRICE = 1000
ALLOTMENT = 20
def pr_of_booking(current_price):
asymptote = 1
slope = 0.01
shift = INITIAL_PRICE - 100
pr_booking = asymptote / (1 + math.exp(slope * (current_price - shift)))
if 0 <= pr_booking <= 1:
return pr_booking
else:
raise ValueError
def random_booking(current_price):
target_avg_nr_bookings_per_day = ALLOTMENT/DAYS_OVERALL
target_pr_at_pseudo_market_price = pr_of_booking(INITIAL_PRICE + 100)
prospective_bookings = poisson.rvs(mu=target_avg_nr_bookings_per_day / target_pr_at_pseudo_market_price,
size=1)[0]
_sum = 0
for i in range(prospective_bookings):
prob = pr_of_booking(current_price=current_price)
if bernoulli.rvs(prob, size=1)[0] == 1:
_sum += 1
return _sum
class Environment(gym.Env):
"""
Observation:
Type: Box(4)
Num Observation
0 Deviation from the target curve
1 Price
2 Days before departure
3 Ratio bookings/capacity
Actions:
Type: Discrete(5)
Num Action
0 Price -= 100
1 Price -= 50
2 No change in price
3 Price += 50
4 Price += 100
"""
def __init__(self):
self.action_space = Box(-np.array([100]),np.array([100]),dtype=np.float32)
high = np.array([1, np.inf, 1, DAYS_OVERALL])
low = np.array([-1, 0, 0, 0])
self.observation_space = Box(low, high, dtype=np.float32)
self.state = None
self.bookings = [np.NaN for _ in range(DAYS_OVERALL+1)]
self.price_adjustments = [np.NaN for _ in range(DAYS_OVERALL+1)]
self.price = [np.NaN for _ in range(DAYS_OVERALL+1)]
def step(self, action):
(deviation,
price,
ratio,
dbd) = self.state
today = int(DAYS_OVERALL - dbd)
# price_adjustment = {0: -100,
# 1: -50,
# 2: 0,
# 3: 50,
# 4: 100}[action]
price_adjustment = action
self.price_adjustments[today] = action
if dbd == DAYS_OVERALL:
new_price = INITIAL_PRICE + price_adjustment
else:
new_price = self.price[today-1] + price_adjustment
self.price[today] = new_price
self.bookings[today] = random_booking(new_price)
booked_so_far = sum(self.bookings[:(today + 1)])
new_ratio = booked_so_far/ALLOTMENT
target_ratio = today / DAYS_OVERALL
new_deviation = new_ratio - target_ratio
new_dbd = dbd - 1
self.state = np.array([new_deviation,
new_price,
new_ratio,
new_dbd])
reward = -10*np.log(abs(new_deviation)+sys.float_info.epsilon)
# reward = -abs(new_deviation)
# if ALLOTMENT <= booked_so_far:
# reward -= (dbd * new_deviation) / 2
done = (dbd == 0) or (ALLOTMENT <= booked_so_far)
info = dict()
print(action, self.state, reward, done, info)
return self.state, reward, done, info
def reset(self):
deviation = 0
price = INITIAL_PRICE
ratio = 0
dbd = DAYS_OVERALL
self.state = np.array([deviation,
price,
ratio,
dbd])
return self.state
def render(self, mode=None):
pass
so the input tensor should be 4 x batch and the output is 1 x batch I think
model.tar.gz output (1).tar.gz
@haohanchen-yagao I have figured out where all the data gets saved and I dont think the model is checkpointing either. I have included what I found in the files
Hi @rishanshah, did you find a solution ? I also want to try different RL methods ?
Dear SageMaker Community, In our attempts to provide this repository with a better level of support going forward, we’re closing issues that were opened prior to the v2 release of the SDK. This is because we believe that over time many of the issues posted were solved with the latest release or other recent changes to the repo. This will help us reallocate resources towards issues that are more likely to still be relevant today. Some of the issues experienced now can be resolved by referencing the v2 guide: https://sagemaker.readthedocs.io/en/stable/v2.html In this guide, you can find simple solutions to common notebook errors, like the renaming of parameters and classes. If you believe your issue is still ongoing and you have updated error messaging or other info, please re-open it, and we investigate the issue. Best Regards, AWS SageMaker Team
model.tar.gz output (1).tar.gz
@haohanchen-yagao I have figured out where all the data gets saved and I dont think the model is checkpointing either. I have included what I found in the files
Could you please share what your solution is to this issue? I'm having the same exact issue.
In case anyone else is still stuck on this, I solved this by switching from Coach to Ray, and subclassing and overwriting the sagemaker ray_launcher.py with this:
import os
import json
import time
from ray.tune.registry import register_env
import hivenv
import hivparameters as params
from sagemaker_rl.ray_launcher import SageMakerRayLauncher
from sagemaker_rl.configuration_list import ConfigurationList
import sagemaker_rl.tf_serving_utils as utils
ENV_NAME = "HIVEnv"
INTERMEDIATE_DIR = "/opt/ml/output/intermediate"
FINAL_MODEL_DIR = os.getenv("SM_MODEL_DIR", "/opt/ml/model")
METADATA_DIR = "/opt/ml/output/data"
TERMINATION_SIGNAL = "JOB_TERMINATED"
CHECKPOINT_DIR = "/opt/ml/input/data/checkpoint"
MODEL_OUTPUT_DIR = "/opt/ml/model"
env_config = {}
class SageMakerHIVLauncher(SageMakerRayLauncher):
def register_env_creator(self):
"""Register the (custom) env to make it available to the ray nodes"""
# Choose a name and register your custom environment
register_env("HIVEnv-v0", lambda env_config: hivenv.HIVEnv(env_config))
def get_experiment_config(self):
return params.experiment_config
def copy_checkpoints_to_model_output(self):
checkpoints = []
count = 0
while not checkpoints:
count += 1
for root, directories, filenames in os.walk(INTERMEDIATE_DIR):
for dir in directories:
if dir.startswith("checkpoint"):
checkpoints.append(os.path.join(root, dir))
time.sleep(5)
if count >= 6:
raise RuntimeError("Failed to find checkpoint files")
checkpoints.sort(key=utils.natural_keys)
print(checkpoints[-1])
# ##### Save has to be done here, while we're grabbing checkpoints
from ray.rllib.algorithms.algorithm import Algorithm
my_new_dqn = Algorithm.from_checkpoint(checkpoints[-1])
utils.export_tf_serving(my_new_dqn, MODEL_OUTPUT_DIR)
def create_tf_serving_model(self, algorithm=None, env_string=None):
self.register_env_creator()
# No-op otherwise -- in the new ray/tensorflow code, it has
# to be done earlier in the process
def customize_experiment_config(self, config):
"""Applies command-line hyperparameters to the config."""
hyperparams_dict = json.loads(os.environ.get("SM_HPS", "{}"))
hyperparams_dict["rl.training.local_dir"] = INTERMEDIATE_DIR
self.hyperparameters = ConfigurationList()
for name, value in hyperparams_dict.items():
if name.startswith("rl."):
self.hyperparameters.store(name, value)
self.hyperparameters.apply_subset(config, "rl.")
print(self.hyperparameters)
return config
if __name__ == "__main__":
SageMakerHIVLauncher().train_main()
Hope this helps!
I have been trying to train a DDPG model on a custom environment but I cannot get the model to save as there is some trouble with the
coach_launcher.py
. I get the following Traceback:EDIT 11/10/2019
I imported tensorflow into a subclassed version of the
SageMakerCoachPresetLauncher
class in mytrain_coach.py
training script however there were another myriad of problems that arose.It turns out that the
coach_launcher.py
function_save_tf_model
has been designed specifically to work with the examples (PPO AND DQN). There are a couple of comments in the code that customers are expected to subclass this class and refine functions where appropriate. The trouble is I have no idea how to define a function to save a DDPG model. I feel that I have narrowed the problem down to the assignment maps that are specific to the PPO model. There is no guidance on how to change this for DDPG. The function in question is defined below:How does one save and deploy a DDPG model??
ALSO
I tried changing framework to MXNET as there is also a
_save_onnx_model
function which I felt would work better than tensorflow. However even without saving, MXNET doesn't work when I try to train a DDPG algorithm.I get the following error:
What is going on?