Open wiekern opened 5 years ago
@wiekern Not sure if it helps you, but I had similar errors and was pretty stuck. After some basic data analysis, I realized I had a few input variables with very limited distribution across groups (ex. Binary age bin with 10,000 rows = 0, and 5 rows = 1). After removing these variables/features, I had no errors.
Again, not sure if that's applicable to you, but was my (embarrassing ) issue.
Thanks for your answer! The distribution might not be the problem, that was my view. I am wondering if the regression model supports input with string like in my case column of "text". I am think of I must be convert text into a numeric value or word embeddings (vector).
model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit() Traceback (most recent call last):
File "
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 1963, in fit bnryfit = super().fit(start_params=start_params,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 227, in fit mlefit = super().fit(start_params=start_params,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\model.py", line 519, in fit xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 215, in _fit xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 327, in _fit_newton callback(newparams)
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 211, in _check_perfect_pred raise PerfectSeparationError(msg)
PerfectSeparationError: Perfect separation detected, results not available
model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit() Traceback (most recent call last):
File "
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 1963, in fit bnryfit = super().fit(start_params=start_params,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 227, in fit mlefit = super().fit(start_params=start_params,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\model.py", line 519, in fit xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 215, in _fit xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 327, in _fit_newton callback(newparams)
File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 211, in _check_perfect_pred raise PerfectSeparationError(msg)
PerfectSeparationError: Perfect separation detected, results not available
Hi, I met an error described in the title when invoking
fit_scores()
. My data structrue is belowand I draw samples 2000 for test, 20000 for control for fitting the matcher, but I have no clue why this error occurs (I have looked into the source code). In addition, I ran the example code for loan.csv successfully, so I wonder if the fields of the data should not be string, rather integer? In fact, the data structure of loan example contains string as well see below
Hope anyone can help, thanks!