Data Engineering project End to End on Microsoft Azure.
This projects includes data extraction, transformation using microsoft azure services.
Archiecture Diagram:
![image](https://github.com/Aishwaryasjsu/DE_Piplineonazure/assets/111553278/2187f14d-b900-4e9c-9b7e-795ec87c6a5b)
Steps included:
- Extracting data from git folder using API.
- Creating container in storage.
- Created data piplines in Data Factory to load data extracted into Data lake Gen2.
- Created Databrick account and transformed data using pyspark.
- Loaded transformed data into DataLake Gen 2.
- Ran analytical queries in Azure Synapse.
- Connected to Tableau to visualise data of olympics.
Services used and learnt:
- DataFactory
- DataLakeGen2
- Storage(container)
- App Registration(secret key creation)
- Key Vault
- IAM role creation to provide access to storage
- Azure Synapse.
- Tableau
- Git
Piplines in Data Factory
![image](https://github.com/Aishwaryasjsu/DE_Piplineonazure/assets/111553278/f60c188f-d8e5-4f9b-bd8f-111e42e8ea0a)
Transforming data in spark using pyspark
https://adb-4155200025381794.14.azuredatabricks.net/?o=4155200025381794#notebook/104741734032349
Synpase Analytics used for visualization and quering
SYNAPSE QUERY AND CHARTS
![image](https://github.com/Aishwaryasjsu/DE_Piplineonazure/assets/111553278/38a7f765-75c6-4c0b-a363-c3dd55e065ed)
![image](https://github.com/Aishwaryasjsu/DE_Piplineonazure/assets/111553278/eb59cfe1-9a7a-4911-8484-570730bb17ec)
![image](https://github.com/Aishwaryasjsu/DE_Piplineonazure/assets/111553278/4d82082f-be72-4bcc-a0d0-cb8eb5c12c2c)