Bug in AdaGrad implementation

CogComp / lbjava

Learning Based Java (LBJava)

http://cogcomp.cs.illinois.edu/page/software_view/LBJava

Other

13 stars 17 forks source link

Bug in AdaGrad implementation #99

Open bhargav opened 7 years ago

bhargav commented 7 years ago

AdaGrad does not increase the size of the weight vector while learning. Weight Vector dimensions might increase if there are new features seen while feature extraction from unseen training examples.

Cause: https://github.com/IllinoisCogComp/lbjava/blob/master/lbjava/src/main/java/edu/illinois/cs/cogcomp/lbjava/learn/AdaGrad.java#L189

Example feature:

discrete MyTestFeature(MyData d) <- {
    return d.isCapitalized() ? "YES" : "NO"
}

For this example, weight vector should have size 3 - YES, NO, Bias Term. But exampleFeatures.length is only 1 here.

Compare with implementation of StochasticGradientDescent.

danyaljj commented 7 years ago

I understand the issue "For this example, weight vector should have size 3", although I don't understand why this is because of how AdaGrad is implemented. The increase in the size of the feature should automatically be taken care of by the lexicon. And it looks like the implementation of the learn(.) function in the AdaGrad (and its weight vector) is based on the length of the input features. Not sure where the problem might be arising ...

bhargav commented 7 years ago

For the example that I provided, exampleFeatures.length would be equal to 1. Since we only have one discrete feature per example. Hence the weight vector would be initialized to be of size 2 (1 for feature + 1 for the bias term). Though the lexicon would have two features, the parameters for learn are just the features present in the current training example.