Closed pwuethri closed 5 years ago
Could you apply autopep8
to your script?
And please keep consistency of blank next to "=" in argument of some functions.
The bellow is a bad example.
s_pol_loss = update_pol(student_pol = student_pol, teacher_pol=teacher_pol, optim_pol=student_optim, batch)
shannon_cross_entropy
should be implemented in loss_functional.py
.
Could you apply
autopep8
to your script? And please keep consistency of blank next to "=" in argument of some functions. The bellow is a bad example.s_pol_loss = update_pol(student_pol = student_pol, teacher_pol=teacher_pol, optim_pol=student_optim, batch)
Understood
I fixed the script to be conform with autopep
I apologize for forgetting applying it again after i had made commits
Yes, i think so.
I tested Pendulum-v0 and reached reward of around -130
Yes, i think so.
I tested Pendulum-v0 and reached reward of around -130
I think reward of -130 is sufficient because TRPO's score was around -130 when I tested.
I see
Please add test for distillation.
I am not sure what you mean exactly by test. run_teacher_distill.py is testing the student policy at the end of the script. Could you please explain me in more detail what you mean by test?
Please add test for distillation.
I am not sure what you mean exactly by test. run_teacher_distill.py is testing the student policy at the end of the script. Could you please explain me in more detail what you mean by test?
@pierrewuethrich You should write test code in the following file, please!
You can check whether your test code is right by nosetests -x tests
https://github.com/DeepX-inc/machina/blob/master/tests/test_algos.py
Got it
Could you write all options in argparser like #135 ?
My pleasure
I am not sure if the script calculats the Shanon-entropy as expected