Alex-Lekov / AutoML_Alex

State-of-the art Automated Machine Learning python library for Tabular Data
MIT License
225 stars 42 forks source link

How can i solve "Columns must be same length as key" error? #36

Closed soonwoojung closed 3 years ago

soonwoojung commented 3 years ago

What i tried :

de = DataPrepare(
                num_generator_features=True, # Generator interaction Num Features
                # operations_num_generator=['/','*','-',],
                )
clean_X_train = de.fit_transform(train_X_all)
de = DataPrepare(clean_and_encod_data=True,
                # cat_encoder_names=['HelmertEncoder','OneHotEncoder'], # Encoders list for Generator cat encodet features
                clean_nan=True, # fillnan
                clean_outliers=True, # method='IQR', threshold=2,
                drop_invariant=True, # drop invariant features (data.nunique < 2)
                num_generator_features=True, # Generator interaction Num Features
                num_denoising_autoencoder=True, # denoising_autoencoder if num features > 2
                normalization=True, # normalization data (StandardScaler)
                cat_features=None, # DataPrepare can auto detect categorical features
                random_state=42,
                verbose=3)

clean_X_train = de.fit_transform(train_X_all)

but getting error :

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _setitem_array(self, key, value)
   3065             if isinstance(value, DataFrame):
   3066                 if len(value.columns) != len(key):
-> 3067                     raise ValueError("Columns must be same length as key")
   3068                 for k1, k2 in zip(key, value.columns):
   3069                     self[k1] = value[k2]

ValueError: Columns must be same length as key

Running Env : Google Colab

I cant understand why i'm getting "ValueError: Columns must be same length as key"

Is there anything to fix in my code or data?

I'm attaching my train data.

2021_04_22_orgianl_agg_heEncoded_looEncoded_df.zip

Thank you. I'm using well

Alex-Lekov commented 3 years ago

try running everything in one DataPrepare.

de = DataPrepare(clean_and_encod_data=True,
                # cat_encoder_names=['HelmertEncoder','OneHotEncoder'], # Encoders list for Generator cat encodet features
                clean_nan=True, # fillnan
                clean_outliers=True, # method='IQR', threshold=2,
                drop_invariant=True, # drop invariant features (data.nunique < 2)
                num_generator_features=True, # Generator interaction Num Features
                num_denoising_autoencoder=True, # denoising_autoencoder if num features > 2
                num_generator_features=True, # Generator interaction Num Features
                # operations_num_generator=['/','*','-',],
                normalization=True, # normalization data (StandardScaler)
                cat_features=None, # DataPrepare can auto detect categorical features
                random_state=42,
                verbose=3)

clean_X_train = de.fit_transform(train_X_all)

I didn't expect the DataPrepare to run several times on the same data.

soonwoojung commented 3 years ago

oh thank you! what i written is i tried 2 different data preparation but same error occured. ( not meaning i did 2 data preparation in sequent )

I don't know the reason but after i did min-Max scaling, then no error occured!

thank you for answering!

onkar1920 commented 2 years ago

I Have coded this:

from geopy.extra.rate_limiter import RateLimiter

geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1) nwdf['location'] = nwdf['name'].apply(geocode) nwdf['point'] = nwdf['location'].apply(lambda loc: tuple(loc.point) if loc else None) nwdf[['latitude', 'longitude']] = pd.DataFrame(nwdf['point'].tolist(), index=nwdf.index)

but getting error as : ValueError Traceback (most recent call last) Input In [28], in <cell line: 2>() 1 # 4 - split point column into latitude, longitude and altitude columns ----> 2 nwdf[['latitude', 'longitude']] = pd.DataFrame(nwdf['point'].tolist(), index=nwdf.index)

File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\frame.py:3643, in DataFrame.setitem(self, key, value) 3641 self._setitem_frame(key, value) 3642 elif isinstance(key, (Series, np.ndarray, list, Index)): -> 3643 self._setitem_array(key, value) 3644 elif isinstance(value, DataFrame): 3645 self._set_item_frame_value(key, value)

File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\frame.py:3685, in DataFrame._setitem_array(self, key, value) 3680 else: 3681 # Note: unlike self.iloc[:, indexer] = value, this will 3682 # never try to overwrite values inplace 3684 if isinstance(value, DataFrame): -> 3685 check_key_length(self.columns, key, value) 3686 for k1, k2 in zip(key, value.columns): 3687 self[k1] = value[k2]

File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexers\utils.py:428, in check_key_length(columns, key, value) 426 if columns.is_unique: 427 if len(value.columns) != len(key): --> 428 raise ValueError("Columns must be same length as key") 429 else: 430 # Missing keys in columns are represented as -1 431 if len(columns.get_indexer_non_unique(key)[0]) != len(value.columns):

ValueError: Columns must be same length as key