agi-brain / xuance

XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library
https://xuance.readthedocs.io/
MIT License
647 stars 104 forks source link

How to export or load multi-agent policies? #49

Open ardian-selmonaj opened 3 months ago

ardian-selmonaj commented 3 months ago

In a multi-agent setting, when training e.g. MAPPO_Agents(), then calling MAPPO_Agents.save_model(model_name='model.pth') and finally loading the model MAPPO_Agents.load_model(path), how can I extract specific policies out of the model? e.g. when 3 agents were trained, "agent_1", "agent_2" and "agent_3", I would like to have the prediction only of a specific agent.

I didn't find out how to exactly do that and would be very thankful for any help.

wenzhangliu commented 3 months ago

Hello, we are very sorry that we have not considered the operation of saving and loading a specific agent model for MARL. At present, saving and loading models in XuanCe are used for the holistic MARL models. We will consider fixing this issue in the future version of XuanCe. Thanks very much for your questions.

ardian-selmonaj commented 3 months ago

This would be very helpful! I like your overall framework for training and evaluating RL agents and the variety of algorithms, but accessing specific MARL policies for inference is indeed an important feature.

wenzhangliu commented 3 months ago

Thank you for your advice; we will consider supporting this function very soon.

ardian-selmonaj commented 3 months ago

Additionally, can you also consider adding a feature to train selected agents with a shared policy? e.g. ag1 and ag2 use the same pi1 but ag3 uses pi2 and so on. This would allow to train the same type of agents in a heterogeneous setting with the same policies.

wenzhangliu commented 3 months ago

It's a good suggestion for more general scenarios, implementing this feature in the current version of XuanCe sounds not easy. We may need to rebuild the policy module of XuanCe to determine whether or not some agents use shared policy.

ardian-selmonaj commented 3 months ago

Ray RLlib supports both these features, however their variety of algorithms is limited. Anyhow, I would be very thankful if you could provide these features soon. I assume many others would benefit from it.