Teacher and On-Policy distillation

DeepX-inc / machina

Control section: Deep Reinforcement Learning framework

MIT License

279 stars 43 forks source link

Teacher and On-Policy distillation #138

Closed pwuethri closed 5 years ago

pwuethri commented 5 years ago

I am not sure if the script calculats the Shanon-entropy as expected

rarilurelo commented 5 years ago

Could you apply autopep8 to your script? And please keep consistency of blank next to "=" in argument of some functions. The bellow is a bad example. s_pol_loss = update_pol(student_pol = student_pol, teacher_pol=teacher_pol, optim_pol=student_optim, batch)

rarilurelo commented 5 years ago

shannon_cross_entropy should be implemented in loss_functional.py.

pwuethri commented 5 years ago

Could you apply autopep8 to your script? And please keep consistency of blank next to "=" in argument of some functions. The bellow is a bad example. s_pol_loss = update_pol(student_pol = student_pol, teacher_pol=teacher_pol, optim_pol=student_optim, batch)

Understood

pwuethri commented 5 years ago

I fixed the script to be conform with autopep

I apologize for forgetting applying it again after i had made commits

pwuethri commented 5 years ago

Yes, i think so.

I tested Pendulum-v0 and reached reward of around -130

takerfume commented 5 years ago

Yes, i think so.

I tested Pendulum-v0 and reached reward of around -130

I think reward of -130 is sufficient because TRPO's score was around -130 when I tested.

pwuethri commented 5 years ago

I see

pwuethri commented 5 years ago

Please add test for distillation.

I am not sure what you mean exactly by test. run_teacher_distill.py is testing the student policy at the end of the script. Could you please explain me in more detail what you mean by test?

takerfume commented 5 years ago

Please add test for distillation.

I am not sure what you mean exactly by test. run_teacher_distill.py is testing the student policy at the end of the script. Could you please explain me in more detail what you mean by test?

@pierrewuethrich You should write test code in the following file, please! You can check whether your test code is right by nosetests -x tests

https://github.com/DeepX-inc/machina/blob/master/tests/test_algos.py

pwuethri commented 5 years ago

Got it

takerfume commented 5 years ago

Could you write all options in argparser like #135 ?

pwuethri commented 5 years ago

My pleasure