VinhDevNguyen / end2end_datapipeline_project

1 stars 0 forks source link

Set up environments #1

Open VinhDevNguyen opened 4 months ago

VinhDevNguyen commented 4 months ago

Plan we do:

Additional:

VinhDevNguyen commented 4 months ago

@ShinVu Need you to explain setup docker for monitoring tools like jmx, grafana, prometheous and how to use it

ShinVu commented 4 months ago

Should we add database as the on-premise database? As the document stated:

VinhDevNguyen commented 4 months ago

Should we add database as the on-premise database? As the document stated:

  • Build on-premises database to populate selected dataset which satisfies: Some data can be snapshotted and ingested daily Some data needs to be ingested via near real-time mechanism (e.g.: every 2-3 hours...)
  • Build an ETL pipeline (using Databricks, ADF...) to ingest data from the database to Azure Data Lake Gen 2 (ADLS), following medallion architecture.

@TranBinhLuatUIT @IAMTOIR Do we need an on-premises database like PostgreSQL or any other database? Right now, we have PostgreSQL installed with Airflow.