Data pipeline
7
stars
2
forks
source link
Data pipeline using Apache Beam, BigQuery and Datalab
Overview
- Batch
Data source(Database,s3, HDFS)------> DataFlow ----> ETL ------> (Data sink eg BigQuery, Database, GCS,s3)
- Streaming
Data sources(Mobile, Web etc) ------> PubSub ------> DataFlow ----> ETL ---> (Data sink eg BigQuery, Database, GCS,s3)
Prerequisites
- Google Project with Billing