mid bootcamp project

Hello hello Kevin 🙋🏻‍♀️ , here we go with the revision of your project

README

What happened to the readme Kevin?!?!?! 😔 The readme is essential for the delivery of the project.

Why is the readme important?

Because it will be the first thing they see of our work, where we represent in a summarised form our work. In this way, the readme will also demonstrate our synthesis and communication skills.

I recommend you to try to do it in the future.

Repo structure

The organization of your repo its really nice Kevin, good work! Only as a detail:

You have a .DS_Store file in the repo! Remember, you should put this file in the .gitignore
In this case you only have one jupyter notebook. But imagine that you have at least two.
What can we do?
- Create a new folder named for example Notebooks where we will store the jupyters
- Number jupyters to indicate working order

Code syntax

Let's go with the code 💪!

P02_cc_classification

When you clean the headers, you have this code:

columns_list = []
pattern = '#_'

for column in data.columns:
    column = re.sub(pattern, '', column)
    columns_list.append(column)

data.columns = columns_list

def clean_headers(df):
df.columns = df.columns.str.lower().str.strip().str.replace(' ', '_')
return df

## Why do you have it separately? Wouldn't it make more sense to put it all inside the function that cleans the columns? 

## In the end, you should end up with something like this
def clean_headers(df):
    columns_list = []
    pattern = '#_'

    for column in data.columns:
        column = re.sub(pattern, '', column)
        columns_list.append(column)

    data.columns = columns_list

    df.columns = df.columns.str.lower().str.strip().str.replace(' ', '_')
    return df

Regarding the functions, when we write a function it mandatory to create a docstring.

What is it?

Is a string that give us the functionality of the code. What to put in the docstring of a function?

What the function does
The parameters it receives and what type they are

What it returns

  # taking one of your functions as an example:
    def clean_headers(df):

        '''
        This function clean the names of the columns
        Args:
            df(dataframe) : the target dataframe
        Returns: 
            The same dataframe with the columns' name cleaned
        '''
        columns_list = []
        pattern = '#_'

        for column in data.columns:
            column = re.sub(pattern, '', column)
            columns_list.append(column)

        data.columns = columns_list

        df.columns = df.columns.str.lower().str.strip().str.replace(' ', '_')
        return df

Here some documentation about docstrings in python

For the future, remember to write the functions in the .py file and import them into the jupyter 😉
Wow!!! I am amazed by this function balanced_classification. You've created a pretty complete function, but it's pretty big. In general, keep functions small with specific objectives. Therefore, my recommendation here is that you try to "split" the functions into several functions.
Regarding the KNN model, when you choose the best value of k you did it perfect. As a detail, in pyhton we have the KElbowVisualizer method which allow us to select optimal number of cluster in a simple way by fitting the model with a range of values.

Here some documentation.

# here an example of what the code would look like

# set the model
model = KMeans()

#inizialice the Visuaized. k correspond with the range of k we want to test 
visualizer = KElbowVisualizer(model, k=(2,15), metric='silhouette')

# fit the model for all the k created in the previous step
visualizer.fit(X)  

# return a plot highlighting the optimum number of k     
visualizer.show()

TODOs

Docstrings in the functions
Functions into the .py file
The readme plis 🙏
DS_Store into gitignore

Even with all that I said, very good job Kevin! 💪🔥

Ironhack-Data-0621-Remote / mid-bootcamp-project

Kevin / Mid Term Project - Credit card classification #14

mid bootcamp project

README

Repo structure

Code syntax

TODOs