Ironhack-Data-Madrid-Julio-2023 / w2-project_pandas

https://github.com
0 stars 5 forks source link

portada

W2 Project - Data cleaning & wrangling

The goal of this project is to combine everything you have learned about data wrangling, cleaning, and manipulation with Pandas so you can see how it all works together. For this project, you will start with this messy data set Shark Attack. You will need to download it, import it, use your data wrangling skills to clean it up, prepare it to be analyzed, and then export it as a clean CSV data file. Some graphs to better understand the data will surely be useful!!

TO DO's

  1. Explore the data and write down what you have found
    • you can use: df.describe(), df["column"], etc.
  2. Use at least 5 data cleaning techniques inside a file named clean.ipynb
    • null values, columns drop, duplicated data, string manipulation, apply fn, categorize, regex, etc.
  3. Show data that validates the conclusions based on your hypoteses in a file named analysis.ipynb

Suggested Ways to Get Started

How to deliver the project

  1. Create a new repo with the name data-cleaning-pandas on your github account.
    • Create a README.md file on repo root with project documentation. Make sure to include as much useful information as possible. Someone that finds the README.md should be able to fully get a gist of the project without browsing your files.
    • Include a .gitignore
    • At least 1 jupyter notebook is required
    • Including your functions in a src.py is very, very highly reccommended (maybe even mandatory, check with your instructors)
    • DO NOT UPLOAD SHARKs ATTACK DATASET TO GITHUB
  2. Open an Issue on this repo and paste your own repo's link.

Links & Resources