elastic / ember

Elastic Malware Benchmark for Empowering Researchers
Other
948 stars 277 forks source link

Why my model get the same results? #69

Open kou18n opened 3 years ago

kou18n commented 3 years ago

I trained a model, and the FPR at 1% and 0.1% are same. Why? ` ROC AUC: 0.9967092396

Ember Model Performance at 1% FPR: Threshold: 1.0001 False Positive Rate: 1.093% False Negative Rate: 3.269% Detection Rate: 96.731%

Ember Model Performance at 0.1% FPR: Threshold: 1.0001 False Positive Rate: 1.093% False Negative Rate: 3.269% Detection Rate: 96.731%

`

emberdf["y_pred"]

0 -3.861475 1 -4.427918 2 -7.410540 3 -6.954000 4 -5.017338 ...
999995 10.547381 999996 14.032081 999997 -5.742430 999998 -3.014685 999999 10.370636 Name: y_pred, Length: 1000000, dtype: float64

bfilar commented 3 years ago

Could you post your code? It is hard to gauge what the issue is without it.

kou18n commented 3 years ago

Thank you for your quick reply. I trained the classification model by myself through a custom loss function, used the same parameter. I didn't use any ember's code to train the model. Then I just want to use ember's evaluation code. I can share the model file. Thank you. 430_339custom_model.txt

emberdf = ember.read_metadata(data_dir)
X_train, y_train, X_test, y_test = ember.read_vectorized_features(data_dir)
lgbm_model = lgb.Booster(model_file=os.path.join(data_dir, "430_339custom_model.txt"))

y_test_pred = lgbm_model.predict(X_test)
y_train_pred = lgbm_model.predict(X_train)
emberdf["y_pred"] = np.hstack((y_train_pred, y_test_pred))

def get_fpr(y_true, y_pred):
    nbenign = (y_true == 0).sum()
    nfalse = (y_pred[y_true == 0] == 1).sum()
    return nfalse / float(nbenign)

def find_threshold(y_true, y_pred, fpr_target):
    thresh = 0.0
    fpr = get_fpr(y_true, y_pred > thresh)
    while fpr > fpr_target and thresh < 1.0:
        thresh += 0.0001
        fpr = get_fpr(y_true, y_pred > thresh)
    return thresh, fpr

testdf = emberdf[emberdf["subset"] == "test"]
print("ROC AUC:", roc_auc_score(testdf.label, testdf.y_pred))
print()

threshold, fpr = find_threshold(testdf.label, testdf.y_pred, 0.01)
fnr = (testdf.y_pred[testdf.label == 1] < threshold).sum() / float((testdf.label == 1).sum())
print("Ember Model Performance at 1% FPR:")
print("Threshold: {:.4f}".format(threshold))
print("False Positive Rate: {:.3f}%".format(fpr * 100))
print("False Negative Rate: {:.3f}%".format(fnr * 100))
print("Detection Rate: {}%".format(100 - fnr * 100))
print()

threshold, fpr = find_threshold(testdf.label, testdf.y_pred, 0.001)
fnr = (testdf.y_pred[testdf.label == 1] < threshold).sum() / float((testdf.label == 1).sum())
print("Ember Model Performance at 0.1% FPR:")
print("Threshold: {:.4f}".format(threshold))
print("False Positive Rate: {:.3f}%".format(fpr * 100))
print("False Negative Rate: {:.3f}%".format(fnr * 100))
print("Detection Rate: {}%".format(100 - fnr * 100))

Results:

ROC AUC: 0.9967092396

Ember Model Performance at 1% FPR:
Threshold: 1.0001
False Positive Rate: 1.093%
False Negative Rate: 3.269%
Detection Rate: 96.731%

Ember Model Performance at 0.1% FPR:
Threshold: 1.0001
False Positive Rate: 1.093%
False Negative Rate: 3.269%
Detection Rate: 96.731%
Kimluur commented 2 years ago

Hi, I had to make my own evaluation code when working with my own model. It should be fairly easy to calculate the fp, fn and tp ratings. Some simple code snippets added below but they are not perfect. ( I did use pandas dataframes for this. ) Please use this as a pseudo code for the refrence of what to do.

def get_fpr(y_true, y_pred):
    nbenign = (y_true == 0).sum()
    predgood = (y_pred == 0).sum()
    tot = y_true - y_pred
    return nbenign,predgood,tot

testdf = emberdf[emberdf["subset"] == "test"]

g,fg,tot = get_fpr(testdf["label"],testdf["y_pred"])

tot.sort_values()
falsepostives = (tot == -1).sum()
falsenegatives = (tot == 1).sum()
correct = (tot == 0).sum()
total = falsepostives+falsenegatives+correct
print("fp",falsepostives, "fn",falsenegatives,"correct",correct,"total",total)
print("percentages: fp",(falsepostives/total)*100, "%fn",(falsenegatives/total)*100,"%correct",(correct/total)*100,"%total 100%")