Closed KevinSpurk closed 3 years ago
Hello hello Kevin 🙋🏻♀️ , here we go with the revision of your project
What happened to the readme Kevin?!?!?! 😔 The readme is essential for the delivery of the project.
Why is the readme important?
Because it will be the first thing they see of our work, where we represent in a summarised form our work. In this way, the readme will also demonstrate our synthesis and communication skills.
I recommend you to try to do it in the future.
The organization of your repo its really nice Kevin, good work! Only as a detail:
.DS_Store
file in the repo! Remember, you should put this file in the .gitignore
Notebooks
where we will store the jupytersLet's go with the code 💪!
P02_cc_classification
columns_list = []
pattern = '#_'
for column in data.columns:
column = re.sub(pattern, '', column)
columns_list.append(column)
data.columns = columns_list
def clean_headers(df):
df.columns = df.columns.str.lower().str.strip().str.replace(' ', '_')
return df
## Why do you have it separately? Wouldn't it make more sense to put it all inside the function that cleans the columns?
## In the end, you should end up with something like this
def clean_headers(df):
columns_list = []
pattern = '#_'
for column in data.columns:
column = re.sub(pattern, '', column)
columns_list.append(column)
data.columns = columns_list
df.columns = df.columns.str.lower().str.strip().str.replace(' ', '_')
return df
What is it?
Is a string that give us the functionality of the code. What to put in the docstring of a function?
What it returns
# taking one of your functions as an example:
def clean_headers(df):
'''
This function clean the names of the columns
Args:
df(dataframe) : the target dataframe
Returns:
The same dataframe with the columns' name cleaned
'''
columns_list = []
pattern = '#_'
for column in data.columns:
column = re.sub(pattern, '', column)
columns_list.append(column)
data.columns = columns_list
df.columns = df.columns.str.lower().str.strip().str.replace(' ', '_')
return df
Here some documentation about docstrings in python
.py
file and import them into the jupyter 😉balanced_classification
. You've created a pretty complete function, but it's pretty big. In general, keep functions small with specific objectives. Therefore, my recommendation here is that you try to "split" the functions into several functions.KElbowVisualizer
method which allow us to select optimal number of cluster in a simple way by fitting the model with a range of values. Here some documentation.
# here an example of what the code would look like
# set the model
model = KMeans()
#inizialice the Visuaized. k correspond with the range of k we want to test
visualizer = KElbowVisualizer(model, k=(2,15), metric='silhouette')
# fit the model for all the k created in the previous step
visualizer.fit(X)
# return a plot highlighting the optimum number of k
visualizer.show()
Docstrings in the functions
Functions into the .py
file
The readme plis 🙏
DS_Store
into gitignore
Even with all that I said, very good job Kevin! 💪🔥
https://github.com/KevinSpurk/Project02_MID_BOOTCAMP