Liuhong99 / Sophia

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
MIT License
931 stars 52 forks source link

Is it applicable for any loss function? #3

Closed subercui closed 1 year ago

subercui commented 1 year ago

Hi, thanks for the great work. I noticed the general usage is for categorical logits. Does it only work with categorical logits? I am working on a regression task with MSE using LLM, can I use it and how to?

ricomnl commented 1 year ago

+1

tengyuma commented 1 year ago

If you have a well-specified probabilistic model, then the GNB estimator will work as is. For example, suppose your probabilistic model for $y|x$ is $N(\mu{\theta}(x), \sigma\theta^2(x))$ where $\mu\theta$ and $\sigma\theta$ are neural nets (which is a common practice in DRL), then you can just use the same algorithm as is (at least in theory). This also works if the std of y|x is known

However, if you simply have a MSE loss, but the standard deviation of y|x is not specified, then maybe some tricks are needed. We can only speculate without any theoretical or empirical evidence: maybe you can first estimate the std of y|x, and then sample Gaussian labels from the model using the output of the model as the mean, and the estimated std as the std. Hope this makes sense.