Open dtch1997 opened 1 year ago
Baseline PPO agent:
CBF PPO agent:
Remarks
lambda
Experiments
Concrete tasks
Baseline PPO agent:
CBF PPO agent:
Remarks
lambda
is set large enough, then the combined reward function defines a SVF, so optimal actor is guaranteed to be safe.Experiments