Closed diebland closed 2 years ago
Hi Blandine! Here we go with some corrections of your mid-bootcamp project.
Overall you readme is great, very complete with all the information that should include. Only a couple of details:
At the start of the readme you have an image but i cant see it. Probably because of the relative path that you use to put it. My recomendation, use the link of the github repository, this link never change!
In the Conclusion
section, seems that you try to include a table. But some problem should occur because I can see some "strange" symbols. If you want to create tables in markdown maybe you can use this page to generate it.
Let's go with the repo structure!
You have a folder names Project_Details
. This is a nice idea but could more usefull if you include all this information in nthe jupyter. As you do the analyses, you will explain them in detail. As it is now it can be a bit confusing and you have to have both files open together. It's better that you explain the details together with the code
You have a file names functions.py
. Three things about this file:
Never use a generic name in a file. I mean, never use functions
as a name of a file. Try to use an explicative name, for example cleaning_functions
Try to store this file in a folder named src
, it is a convention to use this folder to store all the functions code of the project.
In this file you include several functions designed for different purposes. You can use different .py
files created for different purposes. I mean, you can create a .poy
file to with the cleaning functions, a .py
file with the model functions, a .py
file with the SQL functions, etc.
You dont have the docstrings in the functions. Docstrings are essential for the code to be understood
Here some info about docstrings .
def mifunction(argument1, argument2):
'''
What the function does
args:
argument1: data type
argument2: data type
return:
which returns the function
'''
return "Hello"
You can create different folders to store the jupyters and the sql files. In addition, you have all the code in one jupyter. As a recomendation, it is usually better to separate a very long jupyters. Why?
GitHub could give some problems with very large file, being imposible to push the file
It's easier to follow the work. The thing that you could do is create different jupyters, one for the exploratory analysis, one fot the visualization, one for the modelling etc.
You have the functions in a .py
, but it's no use if you also have them in jupyter? The purpose of this file is not to have them in jupyter! So... in this case, you should remove the functions from the jupyter and import the functions in the jupyter. How?
import functions as f
# then, when you want to use some function you should use the alias that you creaste, in this case "f". As in pandas!
# for example
f.renaming(df)
You have created count plots for the categorical columns. This is a nive approach, but we are data analyst and it is import to interpret the results of the plot. It would have been nice if you had included a few conclusions from the graphs you have made. Let it be noted that you are understanding the results you get. The same with the correlation matrix and the outliers (Why outliers dont have relevant impact on our analysis?)
Take care with prints that are too long, for example you have one in the chi square analysis. Having a print that is too long can distract attention, causing the reader to miss lines of code etc.
What means the confussion matrix? Explain it!
Overall, you code is great, mainly because of you have used the classroom code. But you can try to create your own code, mainly for understand and improve our learning.
Good job Blandine!
Sorry, I have just realized that I have raised the issue on my own repo instead of this one... sorry again.