Closed juhannc closed 9 months ago
Hi @Toni-SM, for my work I have to implement MBMPO. But I wanted your opinion on something.
First, some words about MBMPO in case you are not familiar with it. MBMPO being a model-based algorithm, it uses a learned model to train the policy, and, as far as I understand, uses TRPO to train the (meta-)policy. But, the idea could probably be generalized to use other algorithms for the policy. Thus, I could either hard-code TRPO into the agent or pass another agent to the MBMPO.
The later would increase the flexibility but also would require a somewhat different __init__
function, something like:
class MBMPO(Agent):
def __init__(self,
models: Dict[str, Model],
+ agent: Agent,
memory: Optional[Union[Memory, Tuple[Memory]]] = None,
observation_space: Optional[Union[int, Tuple[int], gym.Space]] = None,
action_space: Optional[Union[int, Tuple[int], gym.Space]] = None,
device: Union[str, torch.device] = "cuda:0",
cfg: Optional[dict] = None) -> None:
What's your take on that?
The idea about generalization for other agents looks good... Whenever it is possible to avoid modifying the arguments of the agent constructors well. But there are cases where it is necessary as in AMP, or in the solution you propose for this agent
Thanks for the quick feedback, will go that route then
By the way, the current TRPO implementation iterates through learning epochs for both, the policy and the value... When it should be only for value ( the optimization of the policy should be excluded from this loop).
Add Model-Based Meta-Policy-Optimization (MBMPO)
Introduction and description
Coming soon
Improvements in this PR
Coming soon
Proof of Work
Coming soon
Cheers,
Johann