Data Engineering Workshop
One Day workshop on understanding Docker, Web Scrapping, Regular Expressions, PostgreSQL and Git.
Prerequisites
Use Ubuntu 20.04 LTS with following packages installed
- Python 3.9 or above
- docker
- docker-compose
- pip3
- git (any recent version)
GitHub account
- Create an account on GitHub (Only if you do not have an account)
- Fork DataEngineering-Workshop1 repository. Refer this guide to understand how to fork a repository
- Clone forked repo to your machine using SSH Key.
Docker
- To install docker go to your cloned repository and run the following command
sudo prerequisites/install_docker.sh
Workshop environment setup
- Check if Git, Docker, and Docker Compose are installed in on the system.
- Open the terminal and run the following command to check the version of the prerequisites
- Check Git version
git --version
git version 2.25.1
- Check Docker version
docker --version
Docker version 20.10.17, build 100c701
- Check Docker Compose version
docker-compose --version
docker-compose version 1.25.0, build 0a186604
What will you learn by the end of this workshop?
- By the end of this workshop you will learn how to build docker image and it's usage.
- You will learn how to scrape a website using urllib/requests and Beautifulsoup.
- You will learn Regular Expressions and how to work with it.
- You will learn key features of PostgreSQL.
- You will learn how to dockerize your project.
Schedule