ethen8181 / machine-learning

:earth_americas: machine learning tutorials (mainly in Python3)
MIT License
3.17k stars 650 forks source link

Why does Logistic Regression Solver impact the conclusion? #10

Closed vwang0 closed 3 years ago

vwang0 commented 3 years ago

Ethen, I have an interesting finding.

If we change the solver of LogisticRegression from 'liblinear' to the default 'lbfgs', theeffect will not be significant with pvalue=0.1605910849805837. What the reason behind this change? why you choose 'liblinear' instead of any other solver? Thanks!

ethen8181 commented 3 years ago

which notebook/section are you referring to?

vwang0 commented 3 years ago

which notebook/section are you referring to? I am referring to the Propensity Score Matching notebook (https://nbviewer.jupyter.org/github/ethen8181/machine-learning/blob/master/ab_tests/causal_inference/matching.ipynb). Thanks!

ethen8181 commented 3 years ago

oh, interesting finding. I didn't realize the results were this sensitive to the solver being used. And to answer your question

  1. It's because the matched examples are now drastically different, resulting in an opposite conclusion. I think some model tuning is required. I didn't pay much attention to this step in the notebook.
  2. If I remember correctly, I chose libliinear because it was the default solver at the time (sklearn 0.20.2 is printed at the top of the notebook). But sklearn was planning on changing the default solver to something else and it raised a warning at the time saying that they were going to change the solver in future version if you don't explicitly specify the solver. So I simply passed in the default solver at the time of writing that notebook.
vwang0 commented 3 years ago

I really appreciate your explanation and totally agree with you. Thank you Ethen.

ethen8181 commented 3 years ago

no problem, glad it helped. Do close the issue if it's considered resolved. Thanks.

vwang0 commented 3 years ago

Thanks Ethen again and issue is solved!