BLEND360 / innovation_lab

cool stuff for people to work on
0 stars 0 forks source link

New Automated Process for Acxiom Data Installs #15

Open charlesdublend360 opened 2 years ago

charlesdublend360 commented 2 years ago
  1. Set up scheduler so that these jobs can run automatically. (recommend airflow on AWS or AWS Glue or Databricks scheudler)
  2. Use an ETL orchestrator to create script that decrypts files in parallel.
  3. Use Spark job (AWS Glue/EMR/Databricks) to create smaller parquet files for Snowflake ingestion.
  4. Snowflake job to upload new files into tables
  5. Archive install files into cold storage.
  6. Build automated data quality checks to ensure incoming data doesn't suck.
charlesdublend360 commented 2 years ago

@moonziyue will look into airflow/glue first. Loop in Abhiram and Sam

charlesdublend360 commented 2 years ago

@moonziyue is this complete yet?

moonziyue commented 2 years ago

yes. still wanted to improve the quality of the code when I'm free but we have a working pipeline.