trac_ddpg_pendulum failed

witwolf commented 4 years ago

when test with trac_ddpg_pendulum

python -m alf.bin.train --root_dir=tdp --gin_file=trac_ddpg_pendulum

get error msg below, still investagte on it

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hongyingxiang/FLA/alf/bin/train.py", line 88, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/home/hongyingxiang/FLA/alf/bin/train.py", line 79, in main
    train_eval(FLAGS.root_dir)
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1032, in wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.6/dist-packages/gin/utils.py", line 49, in augment_exception_message_and_reraise
    six.raise_from(proxy.with_traceback(exception.__traceback__), None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/gin/config.py", line 1009, in wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/hongyingxiang/FLA/alf/bin/train.py", line 73, in train_eval
    trainer.train()
  File "/home/hongyingxiang/FLA/alf/trainers/policy_trainer.py", line 315, in train
    summary_max_queue=self._summary_max_queue)
  File "/home/hongyingxiang/FLA/alf/utils/common.py", line 265, in run_under_record_context
    func()
  File "/home/hongyingxiang/FLA/alf/trainers/policy_trainer.py", line 345, in _train
    time_step=time_step)
  File "/home/hongyingxiang/FLA/alf/trainers/off_policy_trainer.py", line 74, in _train_iter
    update_counter_every_mini_batch=self._config.
  File "/home/hongyingxiang/FLA/alf/algorithms/off_policy_algorithm.py", line 105, in train
    mini_batch_length, update_counter_every_mini_batch)
  File "/home/hongyingxiang/FLA/alf/utils/common.py", line 985, in __call__
    return tf_func_instance(get_current_scope(), *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 696, in _call
    return function_lib.defun(fn_with_cond)(*canon_args, **canon_kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2363, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:  assertion failed: [100]
   [[{{node cond/else/_1/StatefulPartitionedCall/while/body/_694/while/body/_1961/StatefulPartitionedCall/Assert/AssertGuard/else/_7599/Assert}}]] [Op:__inference_fn_with_cond_12063]

Function call stack:
fn_with_cond

  In call to configurable 'train_eval' (<function train_eval at 0x7f35b018a840>)

emailweixu commented 4 years ago

not just it fails. Trac seems does not work any more. You can assign this issue to me if you are not working on it.

emailweixu commented 4 years ago

In TracAlgorithm.after_train(), the action before train is the original action from rollout, which is stochastic.: https://github.com/HorizonRobotics/alf/blob/dccefccc40fe82256f554a3b633c2ec93e6aa57f/alf/algorithms/trac_algorithm.py#L140 And the action after train is also stochastic because of epsilon_greedy=1.0: https://github.com/HorizonRobotics/alf/blob/dccefccc40fe82256f554a3b633c2ec93e6aa57f/alf/algorithms/trac_algorithm.py#L220-L221 This two stochasticity make it impossible to reduce the change. So TrustedUpdater.adjust_step() fails after trying 100 steps.

witwolf commented 4 years ago

not just it fails. Trac seems does not work any more. You can assign this issue to me if you are not working on it.

Yes, it does not work , and i have made a few change on https://github.com/HorizonRobotics/alf/commit/b521c6aa0d2f09ff905355c1926b38eab94bd321

using predicted action (noiseless action) when calc distance for old and new policy
calc total dist as \frac{\sum^T\sum^B\sqrt{D}}{B*N} instead of \sqrt{\frac{\sum^T\sum^BD}{B*N}}

and it still can not work with ddpg

emailweixu commented 4 years ago

not just it fails. Trac seems does not work any more. You can assign this issue to me if you are not working on it.

Yes, it does not work , and i have made a few change on b521c6a

using predicted action (noiseless action) when calc distance for old and new policy

calc total dist as \frac{\sum^T\sum^B\sqrt{D}}{B*N} instead of \sqrt{\frac{\sum^T\sum^BD}{B*N}}

and it still can not work with ddpg

It seems that you only fixed the first stochasticity.

emailweixu commented 4 years ago

Fixed by #321

HorizonRobotics / alf

trac_ddpg_pendulum failed #315