QinbinLi / DPBoost

Privacy-Preserving Gradient Boosting Decision Trees (AAAI 2020)
MIT License
24 stars 12 forks source link

Testing data evaluation code #7

Open MarianneAK opened 8 months ago

MarianneAK commented 8 months ago

Hello! First of all, thank you for sharing your code, it has been immensely helpful in experimenting with differential privacy in gradient boosting trees. I'd like to ask if it would be possible for the authors to share the code that was used to evaluate the model on the test sets. I'm trying to use this code as a baseline (and reproduce the same test scores as the paper) for some experiments I'm conducting. I've tried using lgb.train followed by lgb.predict to evaluate the test set, but no matter what changes I apply (change of the number of trees to 10 as suggested in a previous issue, change of budget...) I still get the same scores. This is the code that I used in run_exp.py (inspired by what I've seen in a previous issue):

    model = lgb.train(params, data,  num_boost_round = n_trees)
    X_test, y_test = fetch_libsvm("a9a_test")

    y_pred_scaled = model.predict(X_test)
    # print("error mean:", results["binary_error-mean"][n_trees - 1])
    # print("error std:", results["binary_error-stdv"][n_trees - 1])

    y_pred = np.where(y_pred_scaled > 0, 1, -1)

    print(f"Test Error = {accuracy_score(y_pred, y_test)}")

Thanks in advance for the help! @PintOfBitter