IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 460 forks source link

Custom Gym Env with CSV data as observations #367

Closed Eriz11 closed 5 years ago

Eriz11 commented 5 years ago

Hi all,

Firstly, the library looks awesome in terms of the structural design and the learning curve is less sharp than what I expected. Congrats for all the work done.

NOTE: As I will be using a custom env, I tried the installation in an Ubuntu 18.04 and Python 3.7 machine, and the CartPole experiments I did worked perfectly. Just to add my two cents here.

I have a custom Gym-like environment, which loads the data from a .csv and each step is one row from that .csv. The custom Gym env is installed with the following path:

_gym-tradeZM/gymtradeZM/envs/tradeEnvScv0.py:ZMTradeEnvv0

I'm using the GymEnvironment class to get ahead with this, but the issue is the following:

"AttributeError: 'GymEnvironment' object has no attribute 'path'" when running the GraphManager. I understand the error because it is true that I don't find any path attribute int he GymEnvironment object.

The idea behind using the GymEnvironment class is that I can access my custom environment directly, so that I can save some summary data after the training. If I use the GymVectorEnvironment or the GymEnvironmentParameters classes I cannot then access my custom env object (¿can I in another manner?).

¿Is there something I'm missing related to loading a custom gym environment that reads the data internally from a .csv? The observation space is just (27,) and the action space is Discrete(3). Any help is much appreciated to further debug my error.

Full preset code:

Create the path for the env.

ENV_NAME = 'tradeRL-v0'
#envPath = os.path.expandvars('${HOME}/Desktop/RLproject/gym-tradeZM/gym_tradeZM/envs/tradeEnvScv0.py:ZMTradeEnvv0')
envPath = 'gym_tradeZM.envs.tradeEnvScv0:ZMTradeEnvv0'

### Define the visualization parameters:
visParameters = VisualizationParameters(print_networks_summary=True, render=True, native_rendering=True, tensorboard=True)

### Define the environment parameters:
envParameters = GymEnvironment(level=envPath, frame_skip=0, visualization_parameters=visParameters)
### The environment is held inside the GymEnvironment class as self.env.
realEnv = envParameters.env

### Define the task parameters:
tskParameters = TaskParameters(experiment_path=LOG_PATH, checkpoint_save_secs=10, checkpoint_save_dir=LOG_PATH)

### Define the agents parameters:
memoryMaxSizeVar = 50000
agentParameters = DDQNAgentParameters()
### We needed to add the filters, so that it doesn't say that there are None.
agentParameters.input_filter = NoInputFilter()
agentParameters.output_filter = NoOutputFilter()
#agentParameters.batch_size = 1
agentParameters.memory.max_size = (MemoryGranularity.Transitions, memoryMaxSizeVar)
agentParameters.algorithm.num_steps_between_copying_online_weights_to_target = EnvironmentSteps(100)
agentParameters.algorithm.discount = 0.99
agentParameters.algorithm.num_consecutive_playing_steps = EnvironmentSteps(1)
agentParameters.exploration.epsilon_schedule = LinearSchedule(1.0, 0.01, 10000)

### Define the schedule parameters:
scheduleParameters = ScheduleParameters()
### Simple one without evaluation:
#scheduleParameters = SimpleScheduleWithoutEvaluation(improve_steps=TrainingSteps(realEnv.DATASET_SAMPLES))

scheduleParameters.heatup_steps = EnvironmentSteps(200)
### Here I see most of the time using TrainingSteps more than EnvironmentSteps
scheduleParameters.improve_steps = TrainingSteps(realEnv.DATASET_SAMPLES)
#scheduleParameters.improve_steps = EnvironmentSteps(realEnv.DATASET_SAMPLES)
scheduleParameters.evaluation_steps = EnvironmentEpisodes(0)

### Create the graph, that will gather the agent and the environment.
graphManager = BasicRLGraphManager(agent_params=agentParameters,
                                   vis_params=visParameters,
                                   env_params=envParameters,
                                   schedule_params=scheduleParameters,
                                   name='ZMTrade_Exp')

### Train it:
graphManager.improve()

Full stack error:

Traceback (most recent call last): File "/home/zmlaptop/Desktop/RLFrameworks/coachRLProject/runCoachModel.py", line 107, in graphManager.improve() File "/home/zmlaptop/miniconda3/envs/coach_rl/lib/python3.7/site-packages/rl_coach/graph_managers/graph_manager.py", line 531, in improve self.verify_graph_was_created() File "/home/zmlaptop/miniconda3/envs/coach_rl/lib/python3.7/site-packages/rl_coach/graph_managers/graph_manager.py", line 658, in verify_graph_was_created self.create_graph() File "/home/zmlaptop/miniconda3/envs/coach_rl/lib/python3.7/site-packages/rl_coach/graph_managers/graph_manager.py", line 146, in create_graph self.level_managers, self.environments = self._create_graph(task_parameters) File "/home/zmlaptop/miniconda3/envs/coach_rl/lib/python3.7/site-packages/rl_coach/graph_managers/basic_rl_graph_manager.py", line 62, in _create_graph env = short_dynamic_import(self.env_params.path)(**self.env_params.dict, AttributeError: 'GymEnvironment' object has no attribute 'path'

galnov commented 5 years ago

The env_params input to the graph manager should be an EnvironmentParameters object. In this case, it should be a GymEnvironmentParameters object and not a GymEnvironment object. If you would like to access the environment internals, you may want to try using CoachInterface as demonstrated in the "Agent Functionality" subsection of the Getting Started Tutorial. Does this answer your question?

Eriz11 commented 5 years ago

Hey galnov,

Many thanks for taking the time to answer.

I finally (this morning) did resolve the problem using the GymVectorEnvironment, which inherits from GymEnvironmentParameters. So, yes; answered. However, I think that the use of the different classes held in the gym_environment.py module should be more clear in the docs (as me, I was using the GymEnvironment one because it is well documented in the source code and seems the one that permits more tweaking, to then find that it is not the right one to use directly). I'm open to make it happen on my side and add my two cents.

Also, regarding accessing the environment internals, it is a bit convoluted to do so. I finally also get it to work with this workaround: graphManager.environments[0].env. This way, yo get to access the env object and from there anything can be done. Here I don't know if making it more accesible with in the preset-like structure would make sense. Anyway, also open to consider a give a thought about it if it helps.

Best,

Eriz11 commented 5 years ago

Will close this issue as it is resolved.