ankurchavda / streamify

A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
563 stars 118 forks source link

Do we need two separate VMs for this project? #2

Open stephenllh opened 2 years ago

stephenllh commented 2 years ago

From the setup.md, it seems that you have two VMs set up. Just curious, is it necessary?

ankurchavda commented 2 years ago

Hey @stephenllh, It is not a necessity. But if you plan to send a good volume of data from Eventsim, it is going to take up a lot of memory. I had two Eventsim programs writing to Kafka on the VM and I had only around 300MB of memory left out of 16GB. So adding Airflow into that would be a nightmare. I'd say keep a separate VM for Airflow but choose a smaller one.

stephenllh commented 2 years ago

I see. Thanks for telling me. I am actually using this repo as some kind of tutorial for my own learning. You really did a good job, especially the documentation.

ankurchavda commented 2 years ago

Thank you! I hope you enjoy setting up this project and building over it :D