FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.7k stars 1.55k forks source link

the accuracy of Hetero LR is too low for default credit data set #565

Closed Hyberlion closed 5 years ago

Hyberlion commented 5 years ago

If I test the Hetero LR with breast data set, it perform well and the accuracy of model up to 97%. While if I change the dataset to default credit, the accuracy of model is only about 50%, no matter how I changed the parameters of LR algorithm. I know the boost tree may be more suitable for credit dataset, but the accuracy of other ml platform is much higher than 50%.

zazd commented 5 years ago

@Hyberlion please tell us the accuracy and auc, as well as other evaluation indexs, and we will check these. You can also show the parameters of Hetero LR to us if possible, thx !

We has run the Hetero LR and secure_boost using the default credit data set with mode cross_validation, while the average of auc and acc is: Hetero LR: auc: 0.7133 acc: 0.8021 while threshold is 0.5

secure_boost: auc: 0.762 acc: 0.809 while threshold is 0.5

Hyberlion commented 5 years ago

thanks for your reply! Here is my results and parameters. I run the HeteroLR with mode cross_validation and using credit dataset, the result as follows: mean auc: 0.4196 mean accuracy: 0.744 with threshold 0.5 mean precision: 0.1018 with threshold 0.5 the lr parameters as follows: LogisticParam": { "penalty": "L2", "optimizer": "rmsprop", "eps": 1e-5, "alpha": 0.01, "max_iter": 10, "converge_func": "diff", "batch_size": 3200, "learning_rate": 0.15 } and I run the same data set(the whole data combined credit_a and credit_b)with sklearn(the optimizer is lbfgs), the result as follows: acc: 0.8111 auc: 0.6065

and I edited the sklearn's source code(logistic.py) using Tylor approximation to substitute original cost function, the result as follows: acc: 0.79 auc: 0.56

so how can I reproduce yours result? and why precision is just 0.1?