Closed CharlotteStiller closed 3 years ago
Hello hello Charlotte 🙋🏻♀️ , here we go with the revision of your project
As we told you in the presentation you have a perfect readme 🔥💪. I just have to tell you two little thing related to libraries you have used:
If you put the links to the documentation it would be perfect!
How can we put those links in markdown?
[pandas](https://pandas.pydata.org/docs/)
Let's go with this part, I'll try to give you some tips to make the repo as clean as possible.
Your repo has many files, in this case the ideal would be to create different folders where we will save the different files.
We can create a folder for the jupyters that we can call Notebooks
. If you already number the files to know the working order it would be perfect!
The .py
file has to go in a folder called src
(not that we are crazy maniacs, it is more of a convention 🤣).
You have multiple text files where you make a detailed description of each of the phases of the job, put them all in a single folder.
Very good that you didn't miss any rare files outside of the gitignore
.
Good job in this part Charlotte!!
Let's go with the code!
Solutions SQL- Classification
I see that at the beginning of this file you use a function that you have created and that you have in the helper_classification.py
file
but I don't see that you have imported those functions into jupyter.
How can we "bring" the functions from the .py to the jupyter?
We have to put the jupyter
import src.helper_classification as hc # (the alias that you put can be the one that we want)
Once we have this, we can access to each of the functions that we have in that file. How do we call the functions now?
We use the alias that we gave him and the name of the function that we want to use:
hc.get_started
Nothing more to say about this part Charlotte, the truth is that you have it perfect, you have even interpreted the results, super 👏🏽!
Solutions_Python - Classification
First of all, you have made it very difficult for me to look for areas for improvement in your project because it is practically perfect. Let's go into some details, but as I say, I'm just being picky.
Regarding the KNN model, when you choose the best value of k you did it perfect. As a detail, in pyhton we have the KElbowVisualizer
method which allow us to select optimal number of cluster in a simple way by fitting the model with a range of values.
Here some documentation.
# here an example of what the code would look like
# set the model
model = KMeans()
#inizialice the Visuaized. k correspond with the range of k we want to test
visualizer = KElbowVisualizer(model, k=(2,15), metric='silhouette')
# fit the model for all the k created in the previous step
visualizer.fit(X)
# return a plot highlighting the optimum number of k
visualizer.show()
Here, a little recap about some tips:
Well Charlotte, you have done an impeccable job, very well documented, with each step explained, with interpretations of the results. A great exploration of the data etc.
I can only congratulate you 👏🏽
https://github.com/CharlotteStiller/mid-bootcamp-project-classification.git