Implementation of SGD. - Githubissues

duolingo / halflife-regression

MIT License

494 stars 88 forks source link

Thanks for making the great app even better and for uploading the code!

I am working on extending the model to allow each lexeme to have more features and have some minor doubts about the SGD implementation in the code:

The learning rate for the hlr case seems to differ from the method given in the appendix of the paper by a factor of 1 / (1 + inst.p). Is there any particular reason for that?
Usually, SGD iterations are repeated until some convergence criteria is met. In the code, however, it seems that only one pass is made on the complete data-set. Was that because empirically you observe that the results converge by then?

Thanks!

duolingo / halflife-regression