Ironhack-data-bcn-oct-2023 / project-I-pandas

0 stars 3 forks source link

portada

W2 Project - Data cleaning & wrangling

The goal of this project is to combine everything you have learned about data wrangling, cleaning, and manipulation with Pandas so you can see how it all works together. For this project, you will start with this messy data set Shark Attack. You will need to download it, import it, use your data wrangling skills to clean it up, prepare it to be analyzed, and then export it as a clean CSV data file. Some graphs to better understand the data will surely be useful!!

TO DO's

  1. Decide on research question (or research questions)
  2. Explore the data and write down what you have found
    • you can use: df.describe(), df["column"], etc.
  3. Draw graphs that are insightful.
  4. Use at least 5 data cleaning techniques inside a file named clean.ipynb
    • null values, columns drop, duplicated data, string manipulation, apply fn, categorize, regex, etc.
  5. Show data that validates the conclusions based on your research questions in a file named analysis.ipynb
  6. Build a compelling story-telling around your findings. Think of your stakeholders and convince them with your conclusions! (Some slides with few text and pretty plots are normally useful)

Bonus (but...bonus?)

  1. Encapsulate your code into functions and save them into .py files: make sure you have docstrings
  2. Import those functions into your jupyter notebooks and call them (you will substitue your code with your own functions)
  3. Work on titles and comments to have a well presented and cohesive story in your notebook
  4. Include a slide-based presentation where you present your findings/conclusions/insights.

Suggested Ways to Get Started

How to deliver the project

  1. Create a new repo with the name data-cleaning-pandas on your github account (or another name)
    • Create a README.md file on repo root with project documentation. Make sure to include as much useful information as possible. Someone that finds the README.md should be able to fully get a gist of the project without browsing your files.
    • Include a .gitignore
    • At least 1 jupyter notebook is required
    • Including your functions in a src.py is very, very highly reccommended (maybe even mandatory, check with your instructors)
    • DO NOT UPLOAD SHARKs ATTACK DATASET TO GITHUB
    • Make sure that you are as detailed on your README.md as possible . The goal for this is so that everyone (knowledgeable or not )on the topic can understand.
  2. Open an Issue on this repo and paste your own repo's link.

Links & Resources