Closed lkruse closed 2 years ago
• We can use a pessimistic lower bound of the trust region policy optimization objective to obtain a clamped surrogate objective that performs similary without the need for line search.
Thanks!
• We can use a pessimistic lower bound of the trust region policy optimization objective to obtain a clamped surrogate objective that performs similary without the need for line search.