Open chufanchen opened 5 months ago
There are two major strategies to leverage the pre-trained knowledge
using a much smaller learning rate ($0.0001$) for $θ{rps}$ and a slightly larger learning rate ($0.01$) for $θ{cls}$ can greatly enhance the performance of traditional baseline
The proposed SL can almost address the problem of the representation layer, yet the classification layer remains sub-optimal.
https://github.com/GengDavid/SLCA
https://arxiv.org/abs/2303.05118