⚽️ Money Ball FIFA project ⚽️

Hi Elissa, here I am with the first project review.

Readme

First of all, suuper specifying the objective and the tools used in this project 🔝

What did I miss in your readme?

In the readme you have sections that are empty(for example visualization or List of libraries), avoid doing these things. It is better that you don't put them and later add more content than leave things empty.
Regarding the libraries, put them on a list to make it more user friendly. If you add link to the official website it would be perfect. For example:
- pandas
- numpy
- matplotlib
- ...
How can we include link in markdown?
```
[pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html)

If you use this structure you create a link like the ones we create for you when we send you documentation. 

Example 👇🏽👇🏽
```
Here a markdown cheat-sheet in case it help you in the future

Repo structure

Overall, the structure of the repo is perfect, you have all the required files and the most important! No temporary files such as vscode or DS_Store have been left out of the gitignore.

I only have a two question here:

You have one sql file P1_fifa_money_ball.sql. However, in this file you only have one line of code:
```
USE P1_FIFA_money_ball;
```
What you have used file for?? It is important not to include material within our repo that does not add value to it or that we will not use.

During the jupyter you have really nice structure, but I missed some explanations of why you have done some things. For example:

# Why did you do this?
data['height'] = data['height'].str.replace('"',"")
data['height'] = pd.to_numeric(data['height'].map(lambda x: int(x.split("'")[0])*30.48 + int(x.split("'")[1])*2.54))

I love 😍 how you have divided your jupyter into the different working steps, very nice work here 💪!!!

Code syntax

Suuper the managment that you did with the weight , height , release_clause and the value columns.

You have the same line of code to clean two different columns value and wage. To avoid having duplicate code we can create functions, for example:

def cleaning_symbols(data, col):
    data[col] = data[col].map(lambda x: x.lstrip('€'))

    for i in data[col]:
        if 'K' in i:
            data[col] = data[col].str.replace('K', '000')
        elif 'M' in i:
            data[col] = data[col].str.replace('M', '00000')
            data[col] = data[].str.replace('.', '')

# important here pass columns names as strings 
data = cleaning_symbols(data, "wage")
data = cleaning_symbols(data, "value")

This will make our work much cleaner and easier to follow.

Suuper this # dropping columns with more than 75% of NaN values you have made a decission based on data 🚀, this is the objetive 👏🏽
Wow Elissa!!! You include SQL in your project, this is amazing!

When you merge the differents dataframes, you have merged one by one, which is not bad, but again you are repeating the same code's line many times. What can we do to avoid this?

# compile the list of dataframes you want to merge
data_frames = [data_mentalities, data_aggression, data_interceptions, data_positioning, data_vision, data_penalties, data_composure]

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['club'],  how='outer'), data_frames) # i think that that how parameter is outer, but could be better inner 🤔 (i'm not sure)

Regarging the model:
- You include all the variables in the exploratory analysis, for example in the heatmap. This has led you to have a heatmap that is quite difficult to interpret. To avoid this situations we can remove remove those variables that we know are not important to our model or select onlythose that we know are important (with our a priori knowledge)
- When we use the describe method, it is not necessary to create first a "subdataframe" with the numerical columns, by default the describe method only takes into account the numerical columns.
- Suuper 🔝🔝 this function to fill na balies fill_missing_n
- Good job visualizing the outliers for each variable 🔝. good practice 👏🏽.
- To improve your model, maybe you can try to choose different predictive variables until find the best combination of them.
Only as a detail, as a convention the imports in python ususally go at the beginning of the jupyter.

TODOs

Overall your project is super complete Elissa, super good work seriously!!!! Here are a few things you could work on in the future

Visualization: when we are doing machine learning models, visualisation is super important, it can become our best friend 🤣! Not only will it help us simplify our data, but it will also help us understand the relationships between them and better communicate our results.
Storytelling: You have raised the questions, you get some results with some very good tables and tools (very very good use of all the tools that pandas gives us), but I missed a bit of storytelling in the jupyter, that this all a little more spun, in short that you tell me a story, from the beginning, because you ask yourself that question, how you plan to solve it and what are your conclusions and what decision you would take based on all the data that you have extracted.

All in all, good work Elissa, in this project you have reinforced the knowledge acquired so far in the bootcamp. You have explored the data, familiarised yourself with it, cleaned it and created a machine learning model with very good results. Congratulations 🔥!

Ironhack-Data-0621-Remote / Project_FIFA_MoneyBall

[P1_FIFA_money_ball] Elissa de Jong #12

⚽️ Money Ball FIFA project ⚽️

Readme

Repo structure

Code syntax

TODOs