Mongey / terraform-provider-kafka

Terraform provider for managing Apache Kafka Topics + ACLs
MIT License
520 stars 132 forks source link

Error: kafka: client has run out of available brokers to talk to (Is your cluster reachable?) #185

Open ivialex-mcd opened 3 years ago

ivialex-mcd commented 3 years ago

Hi guys,

I was trying create the topics in AWS MSK Cluster, but show this error below for me.

Error: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

  on main.tf line 33, in resource "kafka_topic" "rfmkafkatopic":
  33: resource "kafka_topic" "rfmkafkatopic" {

Note: I was running terraform plan by GitHub Actions. Do I need to set up some security group for example?

And I was using this module to create MSK cluster

module "kafka" {
  source  = "cloudposse/msk-apache-kafka-cluster/aws"
  version = "0.5.2"

  name                      = random_id.msk_cluster_id.hex
  vpc_id                    = data.aws_vpc.my_vpc.id
  security_groups           = [module.eks.worker_security_group_id, var.bastion_host_security_group_id, var.gh_runner_sg]
  subnet_ids                = data.aws_subnet_ids.my_private_subnets.ids
  kafka_version             = var.kafka_version
  number_of_broker_nodes    = var.number_of_broker_nodes
  broker_instance_type      = var.broker_instance_type
  client_broker             = "TLS_PLAINTEXT"
  client_tls_auth_enabled   = false
  client_sasl_scram_enabled = false
  cloudwatch_logs_log_group = aws_cloudwatch_log_group.msk.name
  cloudwatch_logs_enabled   = true
  jmx_exporter_enabled      = true
  node_exporter_enabled     = true

  context = module.this.context

}

My provider.tf file

provider "kafka" {
  bootstrap_servers = [data.aws_msk_cluster.cluster.bootstrap_brokers_tls]
  skip_tls_verify   = var.msk_skip_tls_verify
  tls_enabled       = var.msk_tls_enabled
}

My versions.tf file

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.36"
    }

    http = {
      source  = "terraform-aws-modules/http"
      version = "2.4.1"
    }

    kafka = {
      source  = "Mongey/kafka"
      version = "0.3.3"
    }

    local    = ">= 1.4"
    null     = ">= 2.1"
    template = ">= 2.1"
    random   = ">= 2.1"
  }
}

my data.tf file

data "aws_msk_cluster" "cluster" {
  cluster_name = local.msk_name
}
Brianbrifri commented 3 years ago

I have this error as well when trying to connect to confluent cloud instance of Kafka. My provider is simple:

provider "kafka" {
  bootstrap_servers = ["CCLOUD_CLUSTER_URL"]
  sasl_username  = "CCLOUD_API_KEY"
  sasl_password  = "CCLOUD_API_SECRET"
  sasl_mechanism = "plain"
}

I'm curious what could be wrong?

haidaraM commented 3 years ago

Looks like MSK can't be publicly exposed so Github Actions can't reach it. From the FAQ:

... the only way data can be produced and consumed from an Amazon MSK cluster is over a private connection between your clients in your VPC and the Amazon MSK cluster. Amazon MSK does not support public endpoints.

Check this stackoverflow thread for some possible solutions.

Brianbrifri commented 3 years ago

I have this error as well when trying to connect to confluent cloud instance of Kafka. My provider is simple:

provider "kafka" {
  bootstrap_servers = ["CCLOUD_CLUSTER_URL"]
  sasl_username  = "CCLOUD_API_KEY"
  sasl_password  = "CCLOUD_API_SECRET"
  sasl_mechanism = "plain"
}

I'm curious what could be wrong?

Turns out you can't use the Cloud API keys, you need a regular cluster api key with appropriate access.

Constantin07 commented 2 years ago

Looks like the public access for MSK cluster can be enabled if connecting from outside of VPC where it was originally deployed https://docs.aws.amazon.com/msk/latest/developerguide/public-access.html.

vivere-dally commented 2 years ago

Looks like MSK can't be publicly exposed so Github Actions can't reach it. From the FAQ:

... the only way data can be produced and consumed from an Amazon MSK cluster is over a private connection between your clients in your VPC and the Amazon MSK cluster. Amazon MSK does not support public endpoints.

Check this stackoverflow thread for some possible solutions.

This is no longer the case, see this.

arinhouck commented 1 year ago

Does anyone have good information how this works with a publicly accessible MSK? The docs are a bit confusing to me.

I first tried using public endpoint with SCRAM and then realized I needed to have within VPC (it had a permission error).

Second I tried to using the zookeeper endpoint provided by MSK and using the SCRAM auth I have configured and also without SCRAM auth (TLS enabled). After reading docs, I was under the impression ACLs are managed via Zookeeper if you don't use IAM auth.

Third, I tried against private endpoint with SCRAM but it gives me a permission error.

I have codebuild project setup within private subnet of VPC pulling from a github repo to run the following terraform script:

terraform {
  required_providers {
    kafka = {
      source = "Mongey/kafka"
    }
  }
}

provider "kafka" {
  bootstrap_servers = split(",", var.servers)
  tls_enabled       = true
  # also applied SCRAM config at one point
  # sasl_username     = var.scram_username
  # sasl_password     = var.scram_password
  # sasl_mechanism    = "scram-sha512"
}

resource "kafka_acl" "main" {
  resource_name       = "*"
  resource_type       = "Any"
  acl_principal       = "User:*.broker-url.amazonaws.com" # this is the public url from the bootstrap servers according to docs
  acl_host            = "*"
  acl_operation       = "Any"
  acl_permission_type = "Any"
}

resource "kafka_topic" "agent_index" {
  name               = "agent_index"
  replication_factor = 2
  partitions         = 2

  depends_on = [
    kafka_acl.main
  ]
}

sources: https://docs.amazonaws.cn/en_us/msk/latest/developerguide/msk-acls.html and https://docs.aws.amazon.com/msk/latest/developerguide/public-access.html

Keep getting this error when using private or public endpoints with SCRAM:

failed to create one or more ACL rules: kafka server: The client is not authorized to send this request type

Then I get the same error from this issue when accessing Zookeeper TLS endpoint (a bit confused on what is require for connecting via TLS - nowhere in amazon docs does it mention anything about how to use TLS on zookeeper or where do I get info from KMS)

client has run out of available brokers to talk to

I really overestimated how complicated it would be to get Amazon MSK setup for public access with streamlined way to manage permissions (coming from Heroku provider). So any hints would be helpful.

arinhouck commented 1 year ago

So I finally got it to work using "brute force" via a session manager codebuild breakpoint (used the plaintext zookeeper config - using the kafka-acl.sh manually). I'll have a follow up sometime next week with details on how to setup a public access MSK using the terraform library for anyone else that may find it valuable.

chriselion commented 1 year ago

@arinhouck Did you ever write up how you got this working? I'm hitting the same problem with SCRAM on MSK, and trying to do everything from within terraform.

mbuotidem commented 1 year ago

@arinhouck any chance you could share the working config? @chriselion were you able to figure this out?

arinhouck commented 1 year ago

Hey guys, apologies I meant to get back. Thanks for the second ping.

I ended up using IAM Auth which allows you to bypass zookeeper / ACLs. I used Kafka Gitops. I created a codebuild triggered by github merge to main. I treated the IAM auth as the "admin" for the pipeline. Then used the kafka gitops yml config file to create ACLs for my SCRAM users and to manage topics.

version: 0.2

env:
  variables:
    KAFKA_SASL_MECHANISM: "AWS_MSK_IAM"
    KAFKA_SECURITY_PROTOCOL: "SASL_SSL"
    KAFKA_SASL_JAAS_CONFIG: "software.amazon.msk.auth.iam.IAMLoginModule required;"
    KAFKA_SASL_CLIENT_CALLBACK_HANDLER_CLASS: "software.amazon.msk.auth.iam.IAMClientCallbackHandler"

phases:
  build:
    commands: 
    - "java -cp ./bin/aws-msk-iam-auth-1.1.6-all.jar:./bin/kafka-gitops com.devshawn.kafka.gitops.MainCommand plan"
    - "java -cp ./bin/aws-msk-iam-auth-1.1.6-all.jar:./bin/kafka-gitops com.devshawn.kafka.gitops.MainCommand apply"
artifacts:
  files:
    - '**/*'