HazyResearch / flyingsquid

More interactive weak supervision with FlyingSquid
Apache License 2.0
313 stars 21 forks source link

Question on the binary Ising model #14

Closed wurenzhi closed 3 years ago

wurenzhi commented 3 years ago

In the paper, a binary Ising model is constructed to handle abstain. I read through the code and it seems to me the Ising model is actually never constructed. I wonder where the Ising model comes into play in the code.

For example, the paper says P(λi = 0, Y dep(i) = 1) is factorizable due to the construction of G (the Ising model) and in code it is simply P(λi = 0, Y dep(i) = 1) = P(λi = 0)*(Y dep(i) = 1)https://github.com/HazyResearch/flyingsquid/blob/28a713a9ac501b7597c2489468ae189943d00685/flyingsquid/label_model.py#L644. I don't understand what's going on here.

I appreciate any explanations. Thanks!

DanFu09 commented 3 years ago

Hi, great question! We designed the augmentation to handle abstains such that the computations to find the marginals would be as similar as possible between the abstain/non-abstain cases.

The first important piece is Lemma 1 in appendix C.1.3 from the paper -- E[lambda_i Y] = E[lambda_i+ Y] (in the paper, the term v_2i and v_2i-1 loosely correspond to a labeling function voting positive or negative in the abstain-augmented Ising model). So in other words, we can use the same triplet equations to solve the parameters of the augmented Ising model as we use to the solve the non-augmented Ising model. Code-wise, we have the code do the same thing in both cases.

The other thing is that we need to compute some slightly different marginals (we need to know what we should expect Y to be when one of the labeling functions abstains). The full matrix equation we need to solve is equation (11) in appendix C.1.4 -- we can get all of the values from either observing it directly, or using the triplet method, except for P(λi = 0, Y dep(i) = 1). But we can derive that we can compute that value using P(λi = 0)*(Y dep(i) = 1) (the derivation is at the top of C.1.4).

In the code, r_vals is the term on the right of equation (11), b_matrix is the matrix on the left, and e_vec_vals is the vector of marginals in the middle. We use the values in e_vec_vals to construct marginals for each clique, which we then use for inference.

wurenzhi commented 3 years ago

Hi, thanks a lot for the detailed explanation! This is really helpful.