exalearn / EXARL

Scalable Framework for Reinforcement Learning
Other
10 stars 5 forks source link

Problem in running ExaCH with async workflow #232

Open fc524079318 opened 2 years ago

fc524079318 commented 2 years ago

I try to run ExaCH with async workflow and the parameter "n_steps" is 10,and I met an error.

Traceback (most recent call last):
  File "start.py", line 24, in <module>
    exa_learner.run()
  File "/home/ai/fc/EXARL/exarl/base/learner_base.py", line 113, in run
    self.workflow.run(self)
  File "/home/ai/fc/EXARL/exarl/workflows/workflow_vault/async_learner.py", line 147, in run
    keep_running = self.actor(workflow, nepisodes)
  File "/home/ai/fc/EXARL/exarl/workflows/workflow_vault/sync_learner.py", line 559, in actor
    next_state, reward, self.done, _ = exalearner.env.step(action)
  File "/home/ai/anaconda3/envs/fc/lib/python3.8/site-packages/gym/core.py", line 229, in step
    return self.env.step(action)
  File "/home/ai/fc/EXARL/exarl/envs/env_vault/ExaCH.py", line 269, in step
    self.currStructVec = self.getNextState(action_idx, self.time_step)
  File "/home/ai/fc/EXARL/exarl/envs/env_vault/ExaCH.py", line 364, in getNextState
    self.info.tf = self.t[i + 1]
IndexError: index 11 is out of bounds for axis 0 with size 11

I found the sync_learner didn't end correctly and in sync_learner.py line 568 if self.steps == exalearner.nsteps: self.steps begin at 0 and when it get to 9,next step will be the 11th,but the nsteps is 10,so self.done will still be false.

rvinaybharadwaj commented 2 years ago

ExaCH env hasn't been maintained in a while, but is this specific to ExaCH or a general problem? If it can be fixed by using >= instead of ==, do you mind issuing a PR?

Jodasue commented 2 years ago

@fc524079318 can you add the configuration files you ran and the command line. Thanks!

fc524079318 commented 2 years ago

@Jodasue I use the learner_cfg.json like

{
    "agent": "DQN-v0",
    "env": "ExaCH-v0",
    "workflow": "sync",
    "n_episodes": 10,
    "n_steps": 10,
    "model": "MLP",
    "output_dir": "./results_dir/",
    "process_per_env": 1,
    "log_level": [3, 3],
    "log_frequency": 1,
    "profile": "None"
}

and the command line is mpiexec -n 4 python start.py --workflow async the start.py is likes EXARL/exarl/driver/main.py

fc524079318 commented 2 years ago

I think it may be a general problem since the error is in sync_learner.py . If I set n_steps to 10,I think it means do 10 step per episode,but it run 11 step before done.I see the done check is before update of self.steps,and I try to run ExaCartPoleStatic with async workflow,I see the self.steps increases to 11 before the episode ends. I changed == to >= but it doesn't work.May be I should change self.steps == exalearner.nsteps: to self.steps == exalearner.nsteps-1: ?

jmohdyusof commented 2 years ago

Yes, the issue is basically C-style, 0-based counting, and your fix logic is correct. My only concern is that this is probably present in multiple learners, and maybe have been partially remedied in various ways in different places. I would say that you can implement your fix and test locally, and we will try to get a consistent remedy in the code soon.