Closed soonwoojung closed 3 years ago
try running everything in one DataPrepare.
de = DataPrepare(clean_and_encod_data=True,
# cat_encoder_names=['HelmertEncoder','OneHotEncoder'], # Encoders list for Generator cat encodet features
clean_nan=True, # fillnan
clean_outliers=True, # method='IQR', threshold=2,
drop_invariant=True, # drop invariant features (data.nunique < 2)
num_generator_features=True, # Generator interaction Num Features
num_denoising_autoencoder=True, # denoising_autoencoder if num features > 2
num_generator_features=True, # Generator interaction Num Features
# operations_num_generator=['/','*','-',],
normalization=True, # normalization data (StandardScaler)
cat_features=None, # DataPrepare can auto detect categorical features
random_state=42,
verbose=3)
clean_X_train = de.fit_transform(train_X_all)
I didn't expect the DataPrepare to run several times on the same data.
oh thank you! what i written is i tried 2 different data preparation but same error occured. ( not meaning i did 2 data preparation in sequent )
I don't know the reason but after i did min-Max scaling, then no error occured!
thank you for answering!
I Have coded this:
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1) nwdf['location'] = nwdf['name'].apply(geocode) nwdf['point'] = nwdf['location'].apply(lambda loc: tuple(loc.point) if loc else None) nwdf[['latitude', 'longitude']] = pd.DataFrame(nwdf['point'].tolist(), index=nwdf.index)
but getting error as : ValueError Traceback (most recent call last) Input In [28], in <cell line: 2>() 1 # 4 - split point column into latitude, longitude and altitude columns ----> 2 nwdf[['latitude', 'longitude']] = pd.DataFrame(nwdf['point'].tolist(), index=nwdf.index)
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\frame.py:3643, in DataFrame.setitem(self, key, value) 3641 self._setitem_frame(key, value) 3642 elif isinstance(key, (Series, np.ndarray, list, Index)): -> 3643 self._setitem_array(key, value) 3644 elif isinstance(value, DataFrame): 3645 self._set_item_frame_value(key, value)
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\frame.py:3685, in DataFrame._setitem_array(self, key, value) 3680 else: 3681 # Note: unlike self.iloc[:, indexer] = value, this will 3682 # never try to overwrite values inplace 3684 if isinstance(value, DataFrame): -> 3685 check_key_length(self.columns, key, value) 3686 for k1, k2 in zip(key, value.columns): 3687 self[k1] = value[k2]
File ~\Anaconda3\envs\geo_env\lib\site-packages\pandas\core\indexers\utils.py:428, in check_key_length(columns, key, value) 426 if columns.is_unique: 427 if len(value.columns) != len(key): --> 428 raise ValueError("Columns must be same length as key") 429 else: 430 # Missing keys in columns are represented as -1 431 if len(columns.get_indexer_non_unique(key)[0]) != len(value.columns):
ValueError: Columns must be same length as key
What i tried :
but getting error :
Running Env : Google Colab
I cant understand why i'm getting "ValueError: Columns must be same length as key"
Is there anything to fix in my code or data?
I'm attaching my train data.
2021_04_22_orgianl_agg_heEncoded_looEncoded_df.zip
Thank you. I'm using well