cozygene / glint

22 stars 8 forks source link

statsmodels.tools.sm_exceptions.PerfectSeparationError: Perfect separation detected, results not available #2

Closed HuangRuocheng closed 5 years ago

HuangRuocheng commented 6 years ago

Dear fellows,

    I am running glint --ewas --logreg on 8 850k methylation data, and come across this error:

ERROR in ewas; EXCEPTION: <class 'statsmodels.tools.sm_exceptions.PerfectSeparationError'>, Perfect separation detected, results not available; TRACEBACK: /home/Account/huangrc/softwares/glint/parsers/ewas_parser.py:178, in run: return self.runLogReg(args, meth_data.data, meth_data.cpgnames, pheno, covars) ---> /home/Account/huangrc/softwares/glint/parsers/ewas_parser.py:144, in runLogReg: return self.runRegression(data, ewas.LogisticRegression, "LogReg", output_file, cpgnames, pheno, covars) ---> /home/Account/huangrc/softwares/glint/parsers/ewas_parser.py:125, in runRegression: results = module.run() ---> /home/Account/huangrc/softwares/glint/modules/ewas.py:94, in run: results = self.regression() ---> /home/Account/huangrc/softwares/glint/modules/ewas.py:53, in regression: coefs, tstats, p_value = self.regression_function(self.pheno, site, covars = self.covars) ---> /home/Account/huangrc/softwares/glint/utils/regression.py:51, in fit_model: result = logit.fit(disp=False) # False disable the print of "Optimization terminated successfully" message ---> /home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.py:1376, in fit: disp=disp, callback=callback, **kwargs) ---> /home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.py:203, in fit: disp=disp, callback=callback, **kwargs) ---> /home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/base/model.py:425, in fit: full_output=full_output) ---> /home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/base/optimizer.py:184, in _fit: hess=hessian) ---> /home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/base/optimizer.py:248, in _fit_newton: callback(newparams) ---> /home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.py:186, in _check_perfect_pred: raise PerfectSeparationError(msg) Traceback (most recent call last): File "/home/Account/huangrc/softwares/glint/glint.py", line 312, in <module> parser.run() File "/home/Account/huangrc/softwares/glint/glint.py", line 275, in run meth_data = ewas_meth_data) File "/home/Account/huangrc/softwares/glint/parsers/ewas_parser.py", line 178, in run return self.runLogReg(args, meth_data.data, meth_data.cpgnames, pheno, covars) File "/home/Account/huangrc/softwares/glint/parsers/ewas_parser.py", line 144, in runLogReg return self.runRegression(data, ewas.LogisticRegression, "LogReg", output_file, cpgnames, pheno, covars) File "/home/Account/huangrc/softwares/glint/parsers/ewas_parser.py", line 125, in runRegression results = module.run() File "/home/Account/huangrc/softwares/glint/modules/ewas.py", line 94, in run results = self.regression() File "/home/Account/huangrc/softwares/glint/modules/ewas.py", line 53, in regression coefs, tstats, p_value = self.regression_function(self.pheno, site, covars = self.covars) File "/home/Account/huangrc/softwares/glint/utils/regression.py", line 51, in fit_model result = logit.fit(disp=False) # False disable the print of "Optimization terminated successfully" message File "/home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.py", line 1376, in fit disp=disp, callback=callback, **kwargs) File "/home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.py", line 203, in fit disp=disp, callback=callback, **kwargs) File "/home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/base/model.py", line 425, in fit full_output=full_output) File "/home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/base/optimizer.py", line 184, in _fit hess=hessian) File "/home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/base/optimizer.py", line 248, in _fit_newton callback(newparams) File "/home/Account/huangrc/softwares/glint/env/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.py", line 186, in _check_perfect_pred raise PerfectSeparationError(msg) statsmodels.tools.sm_exceptions.PerfectSeparationError: Perfect separation detected, results not available

and I think, if this is because that when doing ewas based on logistic regression, glint will consider all the CpGs in the regression, and thus this error occurs?

As far as I know, only one CpG should be considered in ewas regression, is this right?

Sincerely,

Huang Ruocheng

E-R commented 5 years ago

Dear Huang,

First, note that glint's --ewas --logreg will fit a logistic regression model for each CpG separately (and not for all CpGs jointly).

The error you get is because of a perfect separation in your logistic regression model. This often occurs if you add too many covariates to the regression (or if some of the covariates are unnecessary, e.g., if you have two batches in your data and you use two indicator variables instead of just one) or if your phenotype has a very low variance. Also, perfect separation is more likely in cases of small sample sizes. You should try looking into all of these and changing your model and see if you can get it to work.

Best, Elior