NervanaSystems / neon

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
http://neon.nervanasys.com/docs/latest
Apache License 2.0
3.87k stars 811 forks source link

IndexError: index 5000 is out of bounds for axis 0 with size 5000 #474

Open venkidevictor opened 4 years ago

venkidevictor commented 4 years ago

Hi, I am running my capstone project and working on my dataset. When I tried to clean my dataset removing the outliers, I am getting this error. I am attaching the code as below.

Removing Outliers

Tukey Method

import required libraries

from collections import Counter

Outlier detection

def detect_outliers(df,n,features):

outlier_indices = []

# iterate over features(columns)
for col in features:
    # 1st quartile (25%)
    Q1 = np.percentile(df[col], 25)
    # 3rd quartile (75%)
    Q3 = np.percentile(df[col],75)
    # Interquartile range (IQR)
    IQR = Q3 - Q1

    # outlier step
    outlier_step = 1.5 * IQR

    # Determine a list of indices of outliers for feature col
    outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step )].index

    # append the found outlier indices for col to the list of outlier indices 
    outlier_indices.extend(outlier_list_col)

# select observations containing more than 2 outliers
outlier_indices = Counter(outlier_indices)        
multiple_outliers = list( k for k, v in outlier_indices.items() if v > n )

return multiple_outliers   

List of Outliers

Outliers_to_drop = detect_outliers(data1.drop('Class',axis=1),0,list(data1.drop('Class',axis=1))) data1.drop('Class',axis=1).loc[Outliers_to_drop]

Create New Dataset without Outliers

good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) good_data.info()


IndexError Traceback (most recent call last)

in 1 #Create New Dataset without Outliers ----> 2 good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) 3 good_data.info() ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __getitem__(self, key) 4289 4290 key = com.values_from_object(key) -> 4291 result = getitem(key) 4292 if not is_scalar(result): 4293 return promote(result) IndexError: index 5000 is out of bounds for axis 0 with size 5000 ​Can any one help me to fix this and code it properly.