Closed KL4805 closed 1 year ago
Hi, thank you so much for your valuable questions. The following are the answers based on my understanding.
In your paper (page 4), you said that "In practice, we anneal s progressively in training such that it tends to zero".
Also, in Appendix B, you said that "the expectation is computed as an average of 10 random samples".
Do they match what you say here?
Hi, sorry for the inaccurate response. Since I forgot some details about the code, so I rechecked our implementation today. Here is the new response for your question. First, we did explore the way to change the value "temp" as shown in row 304 of "supervisor.py". But I disabled it in the current version of code. You can enable it when you do the training. It don't have a significant influence to the performance. Second, we explored the "10 random samples" before. You can find that the code has a variable "self.num_sample" in row 26 of "supervisor.py". But it seems I forgot to add one row "for sample in range(self.num_sample):" in the evaluation function. Thanks for pointing it out. It might influence the performance a little bit.
Thanks for your explanation. I will try these.
Thanks anyway for your provided code implementation. They are very helpful.
Dear Chao,
Many thanks to your code! I find some questions that I would like to ask you.
Nonetheless, I appreciate your efforts as this is a simple but effective work. Looking forward to the reply.