Ability to use previously trained models with GAIL

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

17.1k stars 4.15k forks source link

Ability to use previously trained models with GAIL #3419

Closed AcelisWeaven closed 2 years ago

AcelisWeaven commented 4 years ago

Is your feature request related to a problem? Please describe. (not a real story) Let's admit I've got a great model that took weeks to train. It has 10 hidden layers of 1024 units. Inference will probably be really slow.

Describe the solution you'd like Let's say I want to reduce my model size to a single hidden layer of 512 units. I've would have a gail_config.yaml looking like that:

MyEnv:
    num_layers: 1
    hidden_units: 512
    reward_signals:
        gail:
            strength: 1.0
            gamma: 0.99
            encoding_size: 128
            model_path: MyEnv.nn

Would it be possible to use GAIL to be able to create a lighter (and faster) model for inference, instead of retraining from scratch?

harperj commented 4 years ago

Hi @AcelisWeaven -- thanks for the well motivated feature request. At this time the best you could do to try to make this work would be to capture demonstration data from your old model during inference. I'm not sure how effective this approach would be for capturing similar behavior in smaller models, but it seems like a reasonable thing to explore.

I'll share the request with the team to discuss whether it's something we'd like to add.

ervteng commented 4 years ago

To add to what @harperj suggested - the most direct way would be to collect lots of demonstrations from your big model and use behavioral_cloning to clone the behavior over to the new model. This is very similar to the idea of Model Distillation (https://arxiv.org/abs/1503.02531, https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764) in supervised learning.

AcelisWeaven commented 4 years ago

Thanks for the insight! Isn't BC becoming deprecated? (I may misremember this part)

ervteng commented 4 years ago

The standalone BC trainer was - but you can now use BC as part of PPO or SAC (and combine it with GAIL or RL if you want!).

AcelisWeaven commented 4 years ago

Oh that makes sense. Thanks!

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.