Closed liu-jc closed 5 years ago
hi @liu-jc ,
L-BFGS, as I believe, is one of the most popular training algorithm for linear models. Since SGC is linear, we showcase its potential to benefit from second-order optimization.
For text-gcn, we also used L-BFGS when report, as can be seen at https://github.com/Tiiiger/SGC/blob/master/downstream/TextSGC/train.py#L59.
You can try to switch to adam. In my experience, there is no drastic difference in terms of speed and performance. You do want to tune the learning rate and early stop criterion a little bit.
@felixgwu anything to add?
Hi there,
I found you used LBFGS optimizer for the Reddit dataset. And you also claimed this in the paper. I wonder why you chose this optimizer. Why not just use SGD?
And what kind of the optimizer did you use for text classification and semi-supervised geolocation classification when you reported the efficiency? Because I thought different optimizers have different speed efficiency and memory efficiency.
Could you provide any insights? Thanks!