Open ghost opened 1 year ago
Hi, you explanation is correct, the last term exist to help with optimization and loss reduction. The idea here is without the last term the error between the embedding and encoder output might grow arbitrary which is not good for us and it is because of the slow rate of embedding optimization. To solve this problem the suggest to add the third term which helps the encoder to be optimized with low rate to instead help with the loss reduction in total and compensate for the slow optimization rate of the second term which is embedding loss. I hope this helps you!
hello! thank you for your great work!
I have a question about loss function in paper. L = log p(x|z(x)) + ||sg[z(x)] − e|| + β||z(x) − sg[e]||
the author mentioned that a third term exists because e can grow arbitrarily if it doesn't train as fast as the encoder parameters. but I see that term only helps the encoder to be trained faster.
will it help the e to be trained faster too? but I assume that sg[e] is meaning that the e won't be trained by the term. I hope this isn't a silly question ;) thx in advance.