Aishwaryasjsu / DE_Piplineonazure

1 stars 0 forks source link

Data Engineering project End to End on Microsoft Azure.

This projects includes data extraction, transformation using microsoft azure services.

Archiecture Diagram:

image

Steps included:

  1. Extracting data from git folder using API.
  2. Creating container in storage.
  3. Created data piplines in Data Factory to load data extracted into Data lake Gen2.
  4. Created Databrick account and transformed data using pyspark.
  5. Loaded transformed data into DataLake Gen 2.
  6. Ran analytical queries in Azure Synapse.
  7. Connected to Tableau to visualise data of olympics.

Services used and learnt:

  1. DataFactory
  2. DataLakeGen2
  3. Storage(container)
  4. App Registration(secret key creation)
  5. Key Vault
  6. IAM role creation to provide access to storage
  7. Azure Synapse.
  8. Tableau
  9. Git

Piplines in Data Factory

image

Transforming data in spark using pyspark

https://adb-4155200025381794.14.azuredatabricks.net/?o=4155200025381794#notebook/104741734032349

Synpase Analytics used for visualization and quering

SYNAPSE QUERY AND CHARTS image

image

image