HazyResearch / flyingsquid

More interactive weak supervision with FlyingSquid
Apache License 2.0
313 stars 21 forks source link

Returning NaN for probabilities #8

Closed dmitra79 closed 4 years ago

dmitra79 commented 4 years ago

Hello,

I've found that in a number of situations predict_proba_marginalized returns 'nan'. I didn't see this behavior in tutorials or documented, and wasn't sure how to interpret it. Here is one example:

import numpy as np
def squid_run(a):
    m=a.shape[1]
    lm = LabelModel(m)
    lm.fit(a)
    print(lm.estimated_accuracies())
    out=lm.predict_proba_marginalized(a)
    unique, counts = np.unique(out, return_counts=True) 
    print(unique, counts)

n=100
a=np.ones(n)
b=-1*np.ones(n)
z=np.concatenate([a, b, b, a,a,b]).reshape((3,2*n)).transpose()
print(z.shape)
squid_run(z)

n=100
a=np.ones(n)
b=np.zeros(n)
z=np.concatenate([a, b, b, a,a,b]).reshape((3,2*n)).transpose()
print(z.shape)
squid_run(z)

The first of the runs returns 'nan' for all instance probabilities (and 1s for estimated probabilities). The second run returns:

(200, 3)
[0.25, 0.25, 0.25]
[0.5] [200]

The same runs with only two weak labels, ie: ''' z=np.concatenate([a, b, b, a]).reshape((2,2*n)).transpose() ''' result in 'nan' in both cases.

DanFu09 commented 4 years ago

Ah, there was a divide by zero bug in the inference code. This should be fixed now!

dmitra79 commented 4 years ago

I checked out the latest version. The problem is resolved for 3 weak labels, but not for 2, ie. the following still produces nans:

n=100
a=np.ones(n)
b=-1*np.ones(2*n)
z=np.concatenate([a, b, b, a]).reshape((2,3*n)).transpose()
print(z.shape)
squid_run(z)

Is the method completely not applicable when there are only 2 weak labels?

DanFu09 commented 4 years ago

Yes, we need at least three (conditionally-independent) labeling functions to run.

DanFu09 commented 4 years ago

There's now a check for this on label model creation!

dmitra79 commented 4 years ago

Great - thank you!