SheffieldML / GPyOpt

Gaussian Process Optimization using GPy
BSD 3-Clause "New" or "Revised" License
928 stars 261 forks source link

OLS Fitting Issue #315

Closed Abhiparth96 closed 4 years ago

Abhiparth96 commented 4 years ago

I want to get OLS summary for the dataset for backward elimination but I am facing following error for my following piece of code:

Following includes complete code which I have built for the model

Code starts from here

    data=pd.read_csv("C:\\Users\\ezabhkh\\Desktop\\Machine Learning\\Data 
    Sets\\50_Startups.csv")

    features=data.iloc[:,:-1].values
    label=data.iloc[:,[-1]].values

This is where I have done Label encoding

    from sklearn.preprocessing import LabelEncoder
    from sklearn.preprocessing import OneHotEncoder
    state=LabelEncoder()
    features[:,3]=state.fit_transform(features[:,3])

This is where I have used Column transfer .

    from sklearn.compose import ColumnTransformer
    transformer = ColumnTransformer([('one_hot_encoder', 
    OneHotEncoder(categories='auto'), [3])],remainder='passthrough')
    features=transformer.fit_transform(features)

This is where stats model code starts

    featuresAllIN=np.append(np.ones((50,1)).astype(int), features, axis=1)

    import statsmodels.formula.api as stat
    model_1=stat.OLS(endog=label ,exog=featuresAllIN).fit()
    model_1.summary()

Error starts from here

    --------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call 
    last)
    <ipython-input-26-ae973551e2ea> in <module>
          1 #Step 3 -> perform OLS and get summary
          2 import statsmodels.formula.api as stat
    ----> 3 model_1=stat.OLS(endog=label ,exog=featuresAllIN).fit()
          4 model_1.summary()

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\statsmodels\regression\linear_model.py in __init__(self, endog, 
    exog, missing, hasconst, **kwargs)
        815                  **kwargs):
        816         super(OLS, self).__init__(endog, exog, missing=missing,
    --> 817                                   hasconst=hasconst, **kwargs)
        818         if "weights" in self._init_keys:
        819             self._init_keys.remove("weights")

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\statsmodels\regression\linear_model.py in __init__(self, endog, 
    exog, weights, missing, hasconst, **kwargs)
        661             weights = weights.squeeze()
        662         super(WLS, self).__init__(endog, exog, missing=missing,
    --> 663                                   weights=weights, 
    hasconst=hasconst, **kwargs)
        664         nobs = self.exog.shape[0]
        665         weights = self.weights

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\statsmodels\regression\linear_model.py in __init__(self, endog, 
    exog, 
    **kwargs)
        177     """
        178     def __init__(self, endog, exog, **kwargs):
    --> 179         super(RegressionModel, self).__init__(endog, exog, 
    **kwargs)
        180         self._data_attr.extend(['pinv_wexog', 'wendog', 'wexog', 
    'weights'])
        181 

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
   packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
        210 
        211     def __init__(self, endog, exog=None, **kwargs):
    --> 212         super(LikelihoodModel, self).__init__(endog, exog, 
    **kwargs)
        213         self.initialize()
        214 

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
   packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
         62         hasconst = kwargs.pop('hasconst', None)
         63         self.data = self._handle_data(endog, exog, 
    missing,hasconst,
    ---> 64                                       **kwargs)
         65         self.k_constant = self.data.k_constant
         66         self.exog = self.data.exog

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\statsmodels\base\model.py in _handle_data(self, endog, exog, 
    missing, hasconst, **kwargs)
         85 
         86     def _handle_data(self, endog, exog, missing, hasconst, 
    **kwargs):
    ---> 87         data = handle_data(endog, exog, missing, hasconst, 
    **kwargs)
         88         # kwargs arrays could have changed, easier to just attach 
    here
         89         for key in kwargs:

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\statsmodels\base\data.py in handle_data(endog, exog, missing, 
    hasconst, **kwargs)
        631     klass = handle_data_class_factory(endog, exog)
        632     return klass(endog, exog=exog, missing=missing, 
    hasconst=hasconst,
    --> 633                  **kwargs)

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\statsmodels\base\data.py in __init__(self, endog, exog, missing, 
    hasconst, **kwargs)
         77 
         78         # this has side-effects, attaches k_constant and const_idx
    ---> 79         self._handle_constant(hasconst)
         80         self._check_integrity()
         81         self._cache = resettable_cache()

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
    packages\statsmodels\base\data.py in _handle_constant(self, hasconst)
        130             check_implicit = False
        131             ptp_ = self.exog.ptp(axis=0)
    --> 132             if not np.isfinite(ptp_).all():
        133                 raise MissingDataError('exog contains inf or 
    nans')
        134             const_idx = np.where(ptp_ == 0)[0].squeeze()

    **TypeError: ufunc 'isfinite' not supported for the input types, and the 
    inputs could not be safely coerced to any supported types according to the 
    casting rule ''safe''**

50_Startups.zip

apaleyes commented 4 years ago

After a quick glance I don't see any GPyOpt code in the above snippets. Can you please clarify how is this issue related to the library?

apaleyes commented 4 years ago

Closing as #316 - this one appears to be using statsmodels, so consider posting in https://github.com/statsmodels/statsmodels