Ironhack-Data-0621-Remote / mid-bootcamp-project

0 stars 19 forks source link

[mid-bootcamp regression project] Spica #9

Closed sumampouw closed 3 years ago

sumampouw commented 3 years ago

https://github.com/sumampouw/mid-bootcamp-project-regression

Fominayasg commented 3 years ago

Hi Spica 😁 !!

Let's have a look at your project!! :rocket:

About the organization of the project the only "but" I can say is that you have DStore files in github, which are temporal files that you don't need and should be on your .gitignore file.

In general is a good idea to directly add a the .gitignore when you make a new repo on github and you can use the templates given. This is because to make a good use of it the git ignore has to be the first file to be committed because if not, all the unliked files that you uploaded before being in the .gitignore won't be "ignored" afterwards.

If you have any doubts about how you shoud handle it don't hesistate to ask any of us.

README

You did a really good job with it! My only advice to improve it is to link the documentation to the libraries like this.

- [Pandas](https://pandas.pydata.org/docs/)
- [Matplotlib](https://matplotlib.org/)
- [Sklearn](https://scikit-learn.org/stable/#)
- [Seaborn](https://seaborn.pydata.org/)

That will look like this:

SQL

You solved all of the questions succesfully but I wpuld like to give you some tips for future queries: In question 11, you can also use this formula to select several conditions for one feature with using OR(but your solution is perfect too):

select bedrooms, round(avg(price),2)
from house_price_data
where bedrooms in (3,4)
group by bedrooms;

Same for the last question, you solved it in a valid way, but I leave you here another option that seems more accurate to me:

select * from house_price_data 
order by price desc 
limit 10,1;

:bulb: Working with "indexes" in python and "limits" in sql is usually very useful.

Regarding data usage from databases, I also saw that you imported the data on the jupyter notebook using sql alchemy :heart_eyes: :rocket:, very good practice.

EDA & Data cleaning

Youd displayed some interesting plots for the exploratory analysis but it would be better if you write a brief explanation for each one.

Feature engineering and ML

You did an amazing job here Spica! You tried a lot of models and I'm very impressed that you used gradient boosting. I'm sure it took you a lot of time.

But again I miss a final explanation that tells me which model sould I use if I were your boss. It would be great if you add just a few lines with these ideas.

Tableau

I know some of you are having problems with tableau licenses but I really encourage you to make some kind of presentation with visualizations of your project. Not for the course but for your own profile boosting because a if a recruiter gets to your repo, it would be much easier for them to see a visual conclusions than reading and understanding all the code, and some of them are just HR workers that don't know about coding.

In general you a very good job Spica!! Keep on going!!:rocket: :whale: