BinPro / CONCOCT

Clustering cONtigs with COverage and ComposiTion
Other
125 stars 48 forks source link

logmodel.fit(X_train, y_train) not working #333

Open fb87fb opened 6 months ago

fb87fb commented 6 months ago

Hi everyone,

I am working on a dataset and therefore, after typing the following instructions, I get back this error. I've tried to figure it out on Google, but as kinda newbie, I am getting so mad! I do appreciate your help.

X= titanic_data.drop("survived", axis=1)
y= titanic_data["survived"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
from sklearn.linear_model import LogisticRegression
logmodel=LogisticRegression()
logmodel.fit(X_train, y_train)

from here i get:


TypeError Traceback (most recent call last) Cell In[72], line 1 ----> 1 logmodel.fit(X_train, y_train)

File ~\anaconda3\Lib\site-packages\sklearn\linear_model_logistic.py:1196, in LogisticRegression.fit(self, X, y, sample_weight)

 1193 else:
   1194     _dtype = [np.float64, np.float32]
-> 1196 X, y = self._validate_data(
   1197     X,
   1198     y,
   1199     accept_sparse="csr",
   1200     dtype=_dtype,
   1201     order="C",
   1202     accept_large_sparse=solver not in ["liblinear", "sag", "saga"],
   1203 )
   1204 check_classification_targets(y)
   1205 self.classes_ = np.unique(y)

File ~\anaconda3\Lib\site-packages\sklearn\base.py:548, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)

  483 def _validate_data(
    484     self,
    485     X="no_validation",
   (...)
    489     **check_params,
    490 ):
    491     """Validate input data and set or check the `n_features_in_` attribute.
    492 
    493     Parameters
   (...)
    546         validated.
    547     """
--> 548     self._check_feature_names(X, reset=reset)
    550     if y is None and self._get_tags()["requires_y"]:
    551         raise ValueError(
    552             f"This {self.__class__.__name__} estimator "
    553             "requires y to be passed, but the target y is None."
    554         )

File ~\anaconda3\Lib\site-packages\sklearn\base.py:415, in BaseEstimator._check_feature_names(self, X, reset)

 395 """Set or check the `feature_names_in_` attribute.
    396 
    397 .. versionadded:: 1.0
   (...)
    411        should set `reset=False`.
    412 """
    414 if reset:
--> 415     feature_names_in = _get_feature_names(X)
    416     if feature_names_in is not None:
    417         self.feature_names_in_ = feature_names_in

File ~\anaconda3\Lib\site-packages\sklearn\utils\validation.py:1903, in _get_feature_names(X)

   1901 # mixed type of string and non-string is not supported
   1902 if len(types) > 1 and "str" in types:
-> 1903     raise TypeError(
   1904         "Feature names are only supported if all input features have string names, "
   1905         f"but your input has {types} as feature name / column name types. "
   1906         "If you want feature names to be stored and validated, you must convert "
   1907         "them all to strings, by using X.columns = X.columns.astype(str) for "
   1908         "example. Otherwise you can remove feature / column names from your input "
   1909         "data, or convert them all to a non-string data type."
   1910     )
   1912 # Only feature names of all strings are supported
   1913 if len(types) == 1 and types[0] == "str":

TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.

What can I do to make it working? Many thanks in advance