Support on inverse algorithm

mingzhe37 commented 10 months ago

The current framework is based on tianshou 0.4.5, which does not support the inverse algorithm yet. The latest version that supports the inverse algorithm is 0.4.11, but it seems there are some changes to the APIs. Since the API changed from 0.4.5 to 0.4.11, some efforts are needed to update the script.

[x] Conversion of expert data
[x] Upgrade tianshou to 0.4.11 in a separate branch
[x] Update run script based on latest API
[x] Update single room environment file's get_state function

mingzhe37 commented 10 months ago

see https://github.com/thu-ml/tianshou/issues/730, it seems like current implementation of the get_state function in single room environment (see below) is no longer supported in newer tianshou.

To be compatible with the newer tianshou version, the customized environment needs to be updated, array and dictionary works.

mingzhe37 commented 10 months ago

Hi @YangyangFu I'm trying to run an example script (test_ddqn_tianshou.py) under tianshou 0.4.11, but encountering the following error. It seems like the reset() function doesn't accept seed, if I removed the seed argument in train_envs.seed()/test_envs.seed() it can run, but it gives the following error if args.seed is passed to these two functions. It is interesting that this script runs well in the original tianshou 0.4.5 version, do you have any ideas on this? cheers,

Process Process-1:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.8/site-packages/tianshou/env/worker/subproc.py", line 125, in _worker
    env.reset(seed=data)
  File "/opt/conda/lib/python3.8/site-packages/gym/wrappers/order_enforcing.py", line 42, in reset
    return self.env.reset(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/gym/wrappers/env_checker.py", line 45, in reset
    return env_reset_passive_checker(self.env, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/gym/utils/passive_env_checker.py", line 192, in env_reset_passive_checker
    result = env.reset(**kwargs)
TypeError: reset() got an unexpected keyword argument 'seed'
Traceback (most recent call last):
  File "/workspaces/mpc-drl-tl/testcases/gym-environments/single-zone/test_action_v1/test_ddqn_tianshou.py", line 286, in <module>
    test_dqn(args)
  File "/workspaces/mpc-drl-tl/testcases/gym-environments/single-zone/test_action_v1/test_ddqn_tianshou.py", line 111, in test_dqn
    train_envs.seed(args.seed)
  File "/opt/conda/lib/python3.8/site-packages/tianshou/env/venvs.py", line 345, in seed
    return [w.seed(s) for w, s in zip(self.workers, seed_list)]
  File "/opt/conda/lib/python3.8/site-packages/tianshou/env/venvs.py", line 345, in <listcomp>
    return [w.seed(s) for w, s in zip(self.workers, seed_list)]
  File "/opt/conda/lib/python3.8/site-packages/tianshou/env/worker/subproc.py", line 256, in seed
    return self.parent_remote.recv()
  File "/opt/conda/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/conda/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

A following question, I don't see the seed() function implemented in single room related environment files, such as below: I'm wondering if train_envs.seed(args.seed) and test_envs.seed(args.seed) actually worked, and how it worked in original 0.4.5 version.

mingzhe37 commented 10 months ago

Updates: By adding a seed attribute to reset() function in single room environment file, there is no more error. see below. Need to mention that based on my understanding, even though there are four seed functions in the run script, only first two Numpy and Torch works, the train/test_envs doesn't work since the seed is not used in single room environment (there is no implementation of seed() and seed in reset() function, since assuming the single room environment is initialized in determined states). It's interesting that keeping these two functions will cause errors in tianshou 0.4.11, while it works for 0.4.5, maybe the newer version make the API more strict?

Check gym's release note on seed() and reset() https://github.com/openai/gym/releases

YangyangFu / mpc-drl-tl

Support on inverse algorithm #126