bbalasub1 / glmnet_python

GNU General Public License v3.0
199 stars 94 forks source link

Weights Variable Issue #19

Open mzhao94 opened 6 years ago

mzhao94 commented 6 years ago

I'm having an issue with using sample weights when trying to run multinomial lasso using the glmnet() function. I get this error: Traceback (most recent call last): File "", line 37, in fit = glmnet(x=sparse_matrix.copy(), y=y_float.copy(), family='multinomial', weights=sweights_float) File "/home/mzhao94/.local/lib/python3.5/site-packages/glmnet_python/", line 455, in glmnet thresh, isd, intr, maxit, kopt, family) File "/home/mzhao94/.local/lib/python3.5/site-packages/glmnet_python/", line 60, in lognet y = y*scipy.tile(weights, (1, ny)) ValueError: operands could not be broadcast together with shapes (190349,3) (1,571047)

I've verified that the "x" dataset's dimensions are (190349, 12249) and the "y" and "weights" arrays are both (190349, 1). I'm not sure why I'm getting this ValueError. Does the "weights" array need to have different specific dimensions? I seem to have no issues when I don't use the weights argument.

Here is the core part of my code:

`id_weight_y = pd.read_csv('resp_weights.csv') ind_vars = pd.read_csv('dummies_ixns.csv') ind_vars = ind_vars.drop(["employed", "pubhous", "fvehicle"], axis=1)

ids = id_weight_y["unique_id"].as_matrix() y = id_weight_y["neverNewOldStmp"].as_matrix() y_float = y.astype(scipy.float64) sweights = id_weight_y["idvdwt_crsec_coreimgrt"].as_matrix() sweights_float = sweights.astype(scipy.float64)

sparse_matrix = scipy.sparse.csc_matrix(ind_vars, dtype=scipy.float64)


startTime = time.time()

fit = glmnet(x=sparse_matrix.copy(), y=y_float.copy(), family='multinomial', weights=sweights_float)

print ('The script took {0} second !'.format(time.time() - startTime))
