Closed susumusakamoto closed 3 years ago
Hello hello Susumu 🙋🏻♀️ , here we go with the revision of your project
You have a very complete readme, very well explained and with the necessary information to understand which are the steps you have been taking along the project.
To finish it off and make it perfect, you can add at the end the libraries you have used and the link to the officil documentation for each one of them.
How can we put links in markdown?
[pandas](https://pandas.pydata.org/docs/)
In general you have a very organized repo, with different folders according "theme" or type of file. However, I leave here some tips in case they bring you value for the future:
You misses the temporary files that mac genertes automatically, The DS_Store
. Remember to put them in the .gitignore
. If you need help on this, let me know and we will see it together 😉!
On the other hand, in this project you only have one jupyter, but it may be the case where you have more than one. In that case, it is also advisable to create a folder for the jupyters and number them to know which is the workflow.
Let's go file by file:
house_price_analysis
Instead of plotting all the graphs one below the other, try using the subplots
of matplotlib
. That will allow us to create multiple subplots given specific dimensions, i.e. 3 columns and 3 rows (a total of 9 plots).
This will allow us to visualise our data at a glance.
Here is some documentation about this method
Regarding the heatmap, it's well donde, but in this case I have two details to give you:
As there are many variables, the values of the correlations are hardly visible, in this case I think they could be removed.
We are data analysts, so it is not enough for us to just put lines of code and that's it, we need to interpret the graphs and draw conclusions. So... in this sense I have missed a bit the conclusions you have drawn from the heatmap.
Regarding the functions, when we write a function it is mandatory to create a docstring.
What is it?
Is a string that give us the functionality of the code. What to put in the docstring of a function?
What the function does
The parameters it receives and what type they are
What it returns
# taking one of your functions as an example:
def remove_outliers(df, threshold=1.5, in_columns=df.select_dtypes(np.number).columns, skip_columns=[]):
'''
Remove dataset outliers
Args:
df (dataframe): the target data set
threshold ( float): by default 1.5
in_columns (list): list with the names of the columns we are interested in
Returns:
The same data set without the outliers
'''
for column in in_columns:
if column not in skip_columns:
upper = np.percentile(df[column],75)
lower = np.percentile(df[column],25)
iqr = upper - lower
upper_limit = upper + (threshold * iqr)
lower_limit = lower - (threshold * iqr)
df = df[(df[column]>lower_limit) & (df[column]<upper_limit)]
return df
Here some documentation about docstrings in python
Also in relation to the functions, remember that the functions should be in the .py
file and import them from jupyter. If you have any doubts about this, let us know and we will look at it together 😉.
Really really good creating one function for each model 🔝🔝
Regarding the conclusions, here some tips for the future:
choose the correct number of n_neighbors
deal with imbalanced data
contingency tables
Tableau
Congratulations on Tableau's work!!!! You have some great graphics, and that map is great!!!!
I leave you here a summary of the most important points to be reworked for the future Susumu.
.py
file Overall, good job Susumu, you left a part of the project unfinished, but what you did is perfect!!!! Congratulations
https://github.com/susumusakamoto/mid-project