Open VinhDevNguyen opened 4 months ago
@ShinVu Need you to explain setup docker for monitoring tools like jmx, grafana, prometheous and how to use it
Should we add database as the on-premise database? As the document stated:
Should we add database as the on-premise database? As the document stated:
- Build on-premises database to populate selected dataset which satisfies: Some data can be snapshotted and ingested daily Some data needs to be ingested via near real-time mechanism (e.g.: every 2-3 hours...)
- Build an ETL pipeline (using Databricks, ADF...) to ingest data from the database to Azure Data Lake Gen 2 (ADLS), following medallion architecture.
@TranBinhLuatUIT @IAMTOIR Do we need an on-premises database like PostgreSQL or any other database? Right now, we have PostgreSQL installed with Airflow.
Plan we do:
Password: !HelloPenis123321
[x] Jupyter notebook + Pyspark that connected to spark cluster -> #4-> Use code-server instead, check out this comment: https://github.com/VinhDevNguyen/end2end_datapipeline_project/issues/4#issuecomment-2227387009#3#7Additional: