Mini project - Data Engineering - Viettel Digital Talent 2024
Project introduction
- Name of project: Analyze e-commerce information on Shopee
- Project objective:
- Top well-rated products by item
- Top liked products by item
- Top products on strong sale
- Top products with the most comments
- Top best selling products
Data flow
Deploy system
1. You should pull and build images in file docker-compose.yaml before
docker pull { ... }
2. Move to clone project and Start system
docker compose up -d
3. Build enviroment on airflow-webserve and airflow-scheduler
docker exec -u root -it [airflow-webserver/airflow-scheduler] bash
source /opt/airflow/trino/build-env.sh
4. After start system, all port website of containers in here
5. Start DAG in Airflow cluster
6. Build enviroment Superset
./superset/bootstrap-superset.sh
7. Visualize data in Superset with SQLalchemy uri
trino://hive@trino:8080/iceberg
Output
Top well-rated products by item
Top liked products by item
Top products with the most comments
Report