DeepX-inc / machina

Control section: Deep Reinforcement Learning framework
MIT License
278 stars 45 forks source link

Add entropy regularised policy distillation #174

Closed pwuethri closed 5 years ago

pwuethri commented 5 years ago
pwuethri commented 5 years ago

@rarilurelo Should i put all the policy distillation approaches into the same example file? We can make a user choose which approach he/she wants to use and then calculate the loss accordingly in the algos/policy_distillation.py file.

I worry that otherwise, the example and algos directory will become cluttered with policy distillation files.

But if you prefere, we can keep each distillation approach separated by different scripts. Which way do you prefere?

rarilurelo commented 5 years ago

I don't care about cluttered files for now. Finishing implementation of distillation, you can try to put them together.

pwuethri commented 5 years ago

Understood

pwuethri commented 5 years ago

Grads were calculated in a wrong manner -> needs to be fixed

pwuethri commented 5 years ago

The calculated gradients are too big (10e3 - 10e4), leading to huge parameter update.

pwuethri commented 5 years ago

total reward term is approx 400'000...

pwuethri commented 5 years ago

student llh seems to be reasonable