bemnetdev / Azure-Data-Warehousing-on-Tokyo-Olympics-Data

In this project I have created a Data Pipeline and Warehouse using Azure Data Services to address the stakeholders' problems and provide better decision-making capabilities and support for Machine Learning projects.
0 stars 0 forks source link

Azure Data Warehousing on Tokyo Olympics Data

In this project I have created a Data Pipeline and Warehouse using Azure Data Services to address the stakeholders' problems and provide better decision-making capabilities and support for Machine Learning projects.

Project-Process-Folw!

Project Executed By :

Bemnet Girma Sahilu

Email: bemnet.dev@gmail.com

LinkedIn : Bemnet Girma

Project Problem:

Downstream stakeholders, including Data Analysts, Data Scientists, and Business Analysts, are currently facing significant challenges with aggregating data from the data source, which has led to several issues impacting their work. The existing data retrieval processes are inefficient and have a performance impact on the source systems. Additionally, the data obtained from the source exhibits numerous data quality issues, making it unreliable for analysis and decision-making purposes.

Project Solution:

Created a Data Pipeline and Warehouse using Azure Data Services to address the stakeholders' problems and provide better decision-making capabilities and support for Machine Learning projects.

Project Goals:

• Aggregate Tokyo Olympics data for analysis.

• Implement an end-to-end data pipeline using Azure Data Services.

• Improve data quality and performance for downstream stakeholders.

• Enable data-driven decision making and support for Machine Learning projects.

Project Timeline:

July 8th 2021 to July 11th 2021

Project Milestones:

• Data extraction and ingestion using Azure Data Factory: July 8th 2021

• Data transformation using Azure Databricks: July 9th 2021

• Data storage and analytics using Azure Synapse Analytics: July 10th 2021

• Project completion and documentation: July 11th 2021

Project Scope:

• Retrieve Tokyo Olympics data from the API using Azure Data Factory.

• Store the raw data in Azure Data Lake Storage Gen2.

• Perform data transformation and cleansing using Azure Databricks.

• Load transformed data into a separate storage location in Azure Data Lake Storage Gen2.

• Utilize Azure Synapse Analytics for running SQL queries and deriving insights from the transformed data.

• Visualize the data and generate reports for stakeholders' analysis using Power BI.

Project Deliverables:

• End-to-end data pipeline architecture documentation.

• Extracted raw Tokyo Olympics data stored in Azure Data Lake Storage Gen2.

• Transformed and cleansed data stored in a separate location in Azure Data Lake Storage Gen2.

• SQL queries and insights derived from Azure Synapse Analytics.

• Data visualization and reports generated for stakeholder analysis.

Tools & Technologies:

• Azure Data Factory

• Azure Databricks

• Azure Data Lake Storage Gen2

• Azure Synapse Analytics

Data Sources:

Tokyo Olympics 2020 Data API

Project-Process-Folw!