Ironhack-Data-0621-Remote / mid-bootcamp-project

0 stars 19 forks source link

[mid-bootcamp-project] Izel #10

Closed izelyekrek closed 3 years ago

izelyekrek commented 3 years ago

https://github.com/izelyekrek/Regression_Mid-Project

Fominayasg commented 3 years ago

Hi Izel 😁 !!

Let's have a look at your project!! :rocket:

README

Your Readme is very well explained but maybe too long, if you specifically want to have a large readme of course you can, is a matter of taste (and it's your project), but I'd sum up it a little bit. Remember that the README is usually just an introductory element that gives the reader a quick overview of what can be founded in the repo, and invites them to have a look at the rest of the files if they are more interested.

You could leave just the essential information like the introduction to the topic, organization of the repo, used libraries and the conclusion if you want it easily accesible.

In the libraries part you can put the links to the documentation like this:

- [Pandas](https://pandas.pydata.org/docs/)
- [Matplotlib](https://matplotlib.org/)
- [Sklearn](https://scikit-learn.org/stable/#)

That will look like this:

About the explanation of the features handling and machine learning I really liked how you explained it but I would include it in the jupyter notebooks intead of the readme.

SQL

In the sql part the first three questions are missing but if you could do the other ones I supossed you were able to do it or maybe you directly went for it using workbench interfaz. If is the second option, I encourage you to do it coding too because it can be useful for you in the future if you need to automatize a project for example. And as always if you have any doubts don't hesitate to ask us :wink:.

The questions are well solved, I only have a few details to point out:

In question 7(of your file) when you say that there is a negative correlation actually there is a huge disparity in the number of houses for a given condition, so it is difficult to establish a correlation between the condition of the house and the grade.

In question 11, you can also use this formula to select several conditions for one feature (but your solution is perfect too):

select bedrooms, round(avg(price),2)
from house_price_data
where bedrooms in (3,4)
group by bedrooms;

Same for the last question, you solved it in a totally valid way, but I leave you here another option that seems simpler to me:

select * from house_price_data 
order by price desc 
limit 10,1;

:bulb: Working with "indexes" in python and "limits" in sql is usually very useful.

EDA & Data cleaning

In the exploratory analysis I missed maybe some plots to have a more visual idea of the data.

In the cleaning data part it would be interesting that you explain it a little more, for example why you decide to drop some columns or why you turn some numerical variables into categorical, which is a great idea but you must keep in mind that the reader is not in your head so the easier you make it the more likely they will continue reading.

Data preprocessing

Nice job trying some posibilities to discover which one is better for your data :rocket:.

ML

It's a very good idea to try with knn regressor and not stopping in linear regression. If you want to learn more about regression models you can check this article, for me it is a very easy-reading summary.

But again I missed any explanations about the models in the jupyter notebook. I know you already explained it in the README file, but while reading your code, some lines will be really helpful.

Tableau

It's nice that you got to visualize your data in tableau. Anyway I encourage you to finish the questions adding some details that are missing and finally putting it all together in a dashboard imagining that is your presentation for your boss.

Good job Izel!!:rocket: