GloballogicPractices / ECS-Kafka

Guide to run a highly available kafka cluster on ECS backed by Netflix's exhibitor for Zookeeper management
MIT License
30 stars 12 forks source link

I can't get this running #2

Open jonathanmv opened 6 years ago

jonathanmv commented 6 years ago

Hi,

Thanks for sharing this ambitious project. I've been trying to deploy a kafka cluster on ecs using the terraform scripts provided in here, however I've been encountering many issues that I will try to describe below. [The referenced files below are in the terraform/environments/development]

  1. I made sure to have installed the requirements. These are the versions in my system:
    Terraform v0.11.8
    ansible 2.7.0
    Python 2.7.9
    boto3
    botocore<1.13.0>
  2. Updated secrets.tfvars with the path to my keys
  3. Updated terraform.tfvars with the appropriate values for my use case.
  4. Ran terraform plan -var-file=secrets.tfvars and got
    Error: Error loading modules: module vpc: not found, may need to run 'terraform init'

    So after running terraform init I get

Initializing modules...
- module.vpc
  Getting source "../../modules/vpc"
- module.glp-private-zone
  Getting source "../../modules/route53-hosted-zone"
- module.bastion
  Getting source "../../modules/bastion"
- module.ecs-kafka-cluster
  Getting source "../../modules/ecs-kafka-zk-cluster"
- module.efs-private-subnet
  Getting source "../../modules/efs"
- module.aws-log-group
  Getting source "../../modules/cloudwatch-log-groups"
- module.ansible-ecs-setup
  Getting source "../../modules/ansible-ecs"

Error: module 'ecs-kafka-cluster': unknown variable referenced: 'kafka_instance_type'; define it with a 'variable' block
Error: module 'ecs-kafka-cluster': unknown variable referenced: 'kafka_asg_max_size'; define it with a 'variable' block
Error: module 'ecs-kafka-cluster': unknown variable referenced: 'efs_kafka_data_dir'; define it with a 'variable' block
Error: module 'ecs-kafka-cluster': unknown variable referenced: 'kafka_asg_min_size'; define it with a 'variable' block
Error: module 'ecs-kafka-cluster': unknown variable referenced: 'kafka_asg_desired_size'; define it with a 'variable' block
Error: module 'glp-private-zone': unknown module referenced: glp-vpc
Error: module "ansible-ecs-setup": "route53_private_domain" is not a valid argument
Error: module "ansible-ecs-setup": missing required argument "log_group_name"
Error: module 'glp-private-zone': reference to undefined module "glp-vpc"

In order to solve the unknown variable referenced errors I added the mentioned variables to the variables.tf file like shown below

// EFS
variable "efs_kafka_data_dir" {}

// ECS Kafka cluster
variable "kafka_asg_max_size" {}
variable "kafka_asg_min_size" {}
variable "kafka_asg_desired_size" {}
variable "kafka_instance_type" {}

Then, in order to solve the Error: module 'glp-private-zone': unknown module referenced: glp-vpc I changed the line 53 in main.tf to use the vpc module instead of using the glp-vpc module.

   vpc_id            = "${module.vpc.vpc_id}"

To get rid of the Error: module "ansible-ecs-setup": "route53_private_domain" is not a valid argument I commented the line 121 in the main.tf file because it says that This is only used for Couchbase server so for now it's ok.

Finally, in order to get rid of the Error: module "ansible-ecs-setup": missing required argument "log_group_name" I added the following line to the module ansible-ecs-setup

   log_group_name                 = "/ecs/${var.environment}-logs"

Only after doing these changes I was able to successfully run terraform init. But then, when I run terraform plan it starts asking for a lot of variables when There should be no manual intervention required.

These are the first three vars it asks for

var.ami_name_regex
  Enter a value: asd

var.ami_owner_name
  Enter a value: asd

var.aws_key_name
  Enter a value: 

I find it really hard to run this script. Is there something that I should be doing in a different way?

liafizan commented 6 years ago

ok that is not how it is supposed to work. Unfortunately I am not getting time but I will try to check this over the coming weekend

jonathanmv commented 6 years ago

Great @faizan82. Thank you very much

patrykk2252 commented 5 years ago

Any update on it? I was able to run the Terraform script by making couple of changes but in general:

 terraform plan -input=false -var-file=terraform.tfvars'

I was getting a lot of errors.

=> terraform plan -input=false -var-file=terraform.tfvars

Error: Required variable not set: cass_ebs_vol_type
Error: Required variable not set: cass_ebs_vol_size
Error: Required variable not set: cassandra_asg_max_size
Error: Required variable not set: cassandra_asg_desired_size
Error: Required variable not set: kong_asg_min_size
Error: Required variable not set: ami_name_regex
Error: Required variable not set: cass_ebs_dev_name
Error: Required variable not set: cassandra_asg_min_size
Error: Required variable not set: ingress_instance_type
Error: Required variable not set: kong_asg_max_size
Error: Required variable not set: ami_owner_name
Error: Required variable not set: cass_data_dir
Error: Required variable not set: kong_asg_desired_size
Error: Required variable not set: cassandra_instance_type

To fix the above I added the missing variables.

For the Ansible part you need to change permissions as following:

chmod +x ec2.py
chmod +x ansible_call_deploy.sh

I have couple of questions: 1) Why there is cassandra required? 2) Why did you use ecs-optimised image for bastion? 3) What is the Kong for is this configuration? 4) How the containers communicate with etch other without load balancers? Is that the Kong for?

patrykk2252 commented 5 years ago

There is a lot changes in Ansible to make it work. I was able to deploy the cluster but zookeeper is failing to establish the cluster. There is communication issue between zoo nodes. If you configure the task-def network mode as host than zookeepers are not able to communicate with itself on 0.0.0.0:2888:3888 for some reason. If you go with bridge then containers cannot communicate with each other. To make it work I would need ELB per port per container (too expensive, too much to manage).

Another issue is to bind IP to container as you need to specify IP address/DNS names of other zoo members. Containers are assigned dynamically on EC2 ECS. We could use Placement Constraints but there is no way (at least I haven't find it yet) to tag the EC2 in predictable way as they are in auto-scaling group.

Those are the issues which stoped me to go further with the project.