ConfidenceWeightedUpdates should not require different temperature

goete111 / factorie

Automatically exported from code.google.com/p/factorie

0 stars 0 forks source link

ConfidenceWeightedUpdates should not require different temperature #5

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

This violates plug-and-play exchangeability of learning rules.

Original issue reported on code.google.com by andrew.k.mccallum on 19 Nov 2009 at 9:21

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

I've experimented with different values of epsilon (confidence) as well as the
initial variance, but the updates remain small. Further, when these parameters
deviate too much from the current default (which I set through experimentation 
on
three different tasks), results in poor performance. We could add a multiplier 
to the
update, but this would likely hurt performance as well (part of the the reason 
CW
works so well is that the updates are conservative in the first place). One 
option to
consider is having CW override the default temperature, or have the default
temperature be low (MIRA may have similar temperature problems in certain 
settings).
A final possibility to consider would be to learn the temperature from the data 
in a
final step right before inference is performed.

Original comment by thebiasedestimator@gmail.com on 23 Nov 2009 at 3:32

GoogleCodeExporter commented 9 years ago

Along the same lines, I'm adding a box constraint to MIRA since constraints 
that are
not separable (due to an impoverished feature space) will results in huge 
updates to
the parameters (resulting in infinite weight vectors)

Original comment by thebiasedestimator@gmail.com on 23 Nov 2009 at 3:44

GoogleCodeExporter commented 9 years ago

i meant eta (not epsilon) ^^

Original comment by thebiasedestimator@gmail.com on 23 Nov 2009 at 4:11

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

OK I have a possible solution here by re-ordering some operations. Rather than
compute the multiplier on the diagonal approximation (resulting in loss of
update-mass), first we'll use the diagonal matrix to 'bend' the gradient, then
compute a multiplier (this way we account for the weight that was projected out 
of
the full covariance matrix).
Initial testing indicates that this may solve problem of ultra-conservative 
updates
and also achieves better generalization (empirically on coreference) than any 
method
to date. This will be checked-in (as a separate CW algorithm) after further 
testing.

Original comment by thebiasedestimator@gmail.com on 17 Apr 2010 at 4:49