Closed wangzheng1209 closed 2 years ago
The loss increasing then decreasing while training the codebook was just something we saw occur across all instances of training for the VQ codebook. It is unclear why exactly it happens. We did do sweeps across the learning rate parameter, and noted that for smaller LR, the model was never able to make it past this peak and saturated at some suboptimal minima. On the other hand, larger LR resulted in much more unstable training that never saturated. Hope this bit helps!