Ironhack-data-bcn-feb-2023 / project-III-eda-etl

0 stars 1 forks source link

portada

eda-etl-pipeline

Overview

The purpose of this project is to combine the data cleansing techniques used in project number 1, together with extraction techniques (API, WebScrapping), visualization and SQL, as well as modularization and encapsulation in a topic or field that you are passionate about. You will have to find a data source, take it to a database, clean it and analyze it according to three hypotheses that you set at the beginning.

What is a pipeline?

A data pipeline is a series of data processes in which the output of each one is the input of the next, forming a chain.

TO DO's

Suggested ways to get started

Then, within each area there are different topics, for example:

Within gastronomy we can find topics such as the evolution of gastronomy in Europe and new trends and how it influences the business. Or the best gastronomies in the world and what to consider before setting up a restaurant, etc…

How to deliver the project

  1. Create a new repo with the name data-pipeline-project on your github account (or another name)
    • Create a README.md file on repo root with project documentation. Make sure to include as much useful information as possible. Someone that finds the README.md should be able to fully get a gist of the project without browsing your files.
    • Include a .gitignore
    • At least 1 jupyter notebook is required
    • Create different folders to put your files: data, my-code, images (if necessary) src.
    • Perform the SQL queries, get the result and upload it to github in a .sql file.
    • Including your functions in a src.py is recommended.
  2. Open an Issue on this repo and paste your own repo's link.
  3. Making a ppt presentation to present your project is highly recommended, but not mandatory.

Links & Resources