mid bootcamp project

Hello hello Susumu 🙋🏻‍♀️ , here we go with the revision of your project

README

You have a very complete readme, very well explained and with the necessary information to understand which are the steps you have been taking along the project.

To finish it off and make it perfect, you can add at the end the libraries you have used and the link to the officil documentation for each one of them.

How can we put links in markdown?

[pandas](https://pandas.pydata.org/docs/)

Repository structure

In general you have a very organized repo, with different folders according "theme" or type of file. However, I leave here some tips in case they bring you value for the future:

You misses the temporary files that mac genertes automatically, The DS_Store. Remember to put them in the .gitignore. If you need help on this, let me know and we will see it together 😉!
On the other hand, in this project you only have one jupyter, but it may be the case where you have more than one. In that case, it is also advisable to create a folder for the jupyters and number them to know which is the workflow.

Syntax code

Let's go file by file:

house_price_analysis
- As a challenge for the future, when you make the distplots and boxplots of the distribution of numerical variables I propose the following:
Instead of plotting all the graphs one below the other, try using the subplots of matplotlib. That will allow us to create multiple subplots given specific dimensions, i.e. 3 columns and 3 rows (a total of 9 plots).

This will allow us to visualise our data at a glance.

Here is some documentation about this method
- Regarding the heatmap, it's well donde, but in this case I have two details to give you:
- As there are many variables, the values of the correlations are hardly visible, in this case I think they could be removed.
- We are data analysts, so it is not enough for us to just put lines of code and that's it, we need to interpret the graphs and draw conclusions. So... in this sense I have missed a bit the conclusions you have drawn from the heatmap.
- Regarding the functions, when we write a function it is mandatory to create a docstring.
What is it?

Is a string that give us the functionality of the code. What to put in the docstring of a function?
- What the function does
- The parameters it receives and what type they are
- What it returns
```
# taking one of your functions as an example:

def remove_outliers(df, threshold=1.5, in_columns=df.select_dtypes(np.number).columns, skip_columns=[]):
'''
Remove dataset outliers

Args: 
    df (dataframe): the target data set
    threshold ( float): by default 1.5
    in_columns (list): list with the names of the columns we are interested in
Returns:
    The same data set without the outliers
'''

    for column in in_columns:
        if column not in skip_columns:
            upper = np.percentile(df[column],75)
            lower = np.percentile(df[column],25)
            iqr = upper - lower
            upper_limit = upper + (threshold * iqr)
            lower_limit = lower - (threshold * iqr)
            df = df[(df[column]>lower_limit) & (df[column]<upper_limit)]
    return df
```
  Here some documentation about docstrings in python
- Also in relation to the functions, remember that the functions should be in the .py file and import them from jupyter. If you have any doubts about this, let us know and we will look at it together 😉.
- Really really good creating one function for each model 🔝🔝
- Regarding the conclusions, here some tips for the future:
- choose the correct number of n_neighbors
- deal with imbalanced data
- contingency tables
Tableau

Congratulations on Tableau's work!!!! You have some great graphics, and that map is great!!!!
SQL Perfect work here Susumu!!!!

TODOs

I leave you here a summary of the most important points to be reworked for the future Susumu.

Used libraries in the project and their links
Finish the models
Functions to the .py file

Overall, good job Susumu, you left a part of the project unfinished, but what you did is perfect!!!! Congratulations

Ironhack-Data-0621-Remote / mid-bootcamp-project