HazyResearch / flyingsquid

More interactive weak supervision with FlyingSquid
Apache License 2.0
313 stars 21 forks source link

KeyError when fitting a model #9

Closed dmitra79 closed 4 years ago

dmitra79 commented 4 years ago

Hello,

I've ran into the following error:

~/flyingsquid/flyingsquid/label_model.py in fit(self, L_train, class_balance, Y_dev, flip_negative, clamp, solve_method, sign_recovery, verbose)
    604                 elif num_Ys(equals_one_tup) != 0 and num_lambdas(equals_one_tup) != 0:
    605                     # If this contains lambdas and Y's, we can't observe it
--> 606                     r_vals[r_val] = probability_values[r_val]
    607                 elif num_Ys(equals_one_tup) != 0:
    608                     # We need to cache this moment
KeyError: (('lambda_1', 'lambda_2', 'lambda_3', 'Y_0'), ('0',))

The labelled model I am creating has m=4 weak labels, and the label_edges are: [(1, 2), (1, 3), (2, 3)]. The data summary is:

count   2.676571e+06    2.676571e+06    2.676571e+06    2.676571e+06
mean    1.793339e-05    7.472247e-07    7.472247e-07    9.493490e-04
std 4.234747e-03    8.644214e-04    8.644214e-04    3.079688e-02
min 0.000000e+00    0.000000e+00    0.000000e+00    0.000000e+00
25% 0.000000e+00    0.000000e+00    0.000000e+00    0.000000e+00
50% 0.000000e+00    0.000000e+00    0.000000e+00    0.000000e+00
75% 0.000000e+00    0.000000e+00    0.000000e+00    0.000000e+00
max 1.000000e+00    1.000000e+00    1.000000e+00    1.000000e+00

It seems that it's due to very rare weak labels?

dmitra79 commented 4 years ago

I've run into the following following Runtime Warning in similar situation (but not both this and above at once):

../anaconda3/envs/mindsynchro-test/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
../anaconda3/envs/mindsynchro-test/lib/python3.7/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
../flyingsquid/flyingsquid/label_model.py:667: RuntimeWarning: invalid value encountered in less
  marginal_vals[marginal_vals < 0] = marginal_vals[marginal_vals < 0] * -1
DanFu09 commented 4 years ago

The graph that you've specified can't be solved with our method -- in order to run, we need to be able to put each labeling function into a triplet of three conditionally-independent labeling functions.

Generally speaking, two labeling functions are conditionally-independent if there's no path between them in lambda_edges (technically, this is only true for the single Y case). So in your graph, labeling function 0 is conditionally-independent from labeling functions 1, 2, and 3 -- but labeling functions 1, 2, and 3 aren't conditionally-independent of each other. So we can't form a triplet of three labeling functions where all three of them are conditionally-independent of each other.

There's now a check for this built-in on label model creation -- it should throw an error if it can't find valid triplets for all labeling functions.

dmitra79 commented 4 years ago

Thank you for the explanation! I tried with the latest version just now, but the exception you've added (below) did not get raised - behavior was same as before

if not self._check():
  raise NotImplementedError('Cannot run triplet method for specified graph.')
DanFu09 commented 4 years ago

Thanks for flagging -- there was a bug in the check function. Should be fixed now!

dmitra79 commented 4 years ago

Yes, seems to be working. Thanks!