Closed dhruvghulati-zz closed 8 years ago
Could you send me a few examples of train_data_features, train_property_labels, open_cost_mat_train such that I can replicate the error. Only 10 or so lines.
So train_data_features may be:
[ [ 0, 0, 0 , 0 , 2, 0, 1, 0],
[ 0, 0, 0 , 0 , 1, 0, 0, 0],
[ 1, 0, 0 , 1 , 4, 0, 3, 0],
[ 0, 2, 0 , 0 , 0, 0, 8, 0],
[ 0, 0, 0 , 0 , 0, 3, 0, 0]]
Representing the bag of words from some sentences as a numpy array. Note each number is <type 'numpy.int64'>
type.
Then train_property_label is a list of unicode labels for each rows of the above, for sake of argument:
[u'A', u'B', u'A', u'C', u'A']
And the open_cost_mat_train is:
[ [ 0.36303512 0. 0. 0. ]
[ 0.24472353 0. 0. 0. ]
[ 0.18386408 0. 0. 0. ]
[ 0.00650667 0. 0. 0. ]
[ 0.06445714 0. 0. 0. ]]
Where each value is <type 'numpy.float64'>
type, and this is a numpy array.
Note, I will be changing the C_FN to be half the C_FP but I am not sure this is the issue.
Note: I checked the type of train_property_labels and changed it from a list to an array, and now get this error:
File "/Users/dhruv/Documents/university/ClaimDetection/src/main/costSensitiveClassifier.py", line 272, in
@dhruvghulati Unfortunately, costla is so far only built for binary classification problems assuming a 0 and 1 label. This may be the problem.
Understood, OK thanks a lot for pointing this out.
I have code like:
costClassifier = CostSensitiveLogisticRegression()
costClassifier.fit(train_data_features, train_property_labels, open_cost_mat_train)
y_open_pred_test_cslr = costClassifier.predict(test_data_features)
Where train data features are a bag of words for 15,000 sentences, train_property_labels are categorical labels for sentences, and open_cost_mat_train is a cost matrix, respectively:
My stack trace however is: