Data Lake Project - Public Transport
Project for CS4225 Big Data Systems for Data Science AY2021/22.
Cloud Architecture
Folders in repository
- Athena: SQL queries for athena table and queries
- EMR: Pyspark scripts
- Frontend: ReactJS
- Lambda: 2 examples of lambda functions used.
- Media: Screenshots and photos related to the project
- Raw Data Examples S3: Example of a raw data output for each API
Grafana for Charts in the frontend
We used grafana to generate charts for visualization in the frontend.
Frontend
Graphs visualized and 2 layers (taxis and congestions) to toggle on/off.
System Monitoring
EMR Metrics exported to Prometheus and visualized in Grafana. 1 master, 2 core (slave) nodes were used.