confluentinc / demo-scene

👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/
https://developer.confluent.io
Apache License 2.0
1.5k stars 896 forks source link

Ansible Tower demo #202

Closed domenicbove closed 3 years ago

domenicbove commented 3 years ago

Adds a new directory for a demo that will accompany a blog

domenicbove commented 3 years ago

@rohit2b @amitkgupta Please review the accompanying demo for the blog

amitkgupta commented 3 years ago

I got an error:

Error: Invalid function argument

  on variables.tf line 30, in resource "aws_key_pair" "default":
  30:   public_key = file(var.ssh_key_public_path)
    |----------------
    | var.ssh_key_public_path is "~/.ssh/id_rsa.pub"

Invalid value for "path" parameter: no file exists at
/Users/agupta/.ssh/id_rsa.pub; this function works only with files that are
distributed as part of the configuration source code, so if this file will be
created by a resource in this configuration you must instead obtain this
result from an attribute of that resource.

There's some missing instruction.

I set export TF_VAR_ssh_key_public_path="~/.ssh/<my-key>.pub" and made progress.

domenicbove commented 3 years ago

Ya good point, most people have ~/.ssh/id_rsa.pub but better than assuming is just to create a new one. That ssh key also needs to be provided to tower in the machine credentials step. so its important to tie everything together

amitkgupta commented 3 years ago

I'm now getting a lot of other issues:

Error: Error launching source instance: InvalidParameter: Security group sg-0dcc889a4ddf50f30 and subnet subnet-0788ed210d1ce22d5 belong to different networks.
    status code: 400, request id: 41a463fa-cbb9-4258-bb53-1c8fa6859076

  on main.tf line 53, in resource "aws_instance" "schema_registry":
  53: resource "aws_instance" "schema_registry" {

Error: Error launching source instance: OptInRequired: In order to use this AWS Marketplace product you need to accept terms and subscribe. To do so please visit https://aws.amazon.com/marketplace/pp?sku=aw0evgkw8e5c1q413zgy5pjce
    status code: 401, request id: 604f3df0-6825-44cb-95df-725c08410532

  on main.tf line 77, in resource "aws_instance" "connect":
  77: resource "aws_instance" "connect" {

Error: Error launching source instance: OptInRequired: In order to use this AWS Marketplace product you need to accept terms and subscribe. To do so please visit https://aws.amazon.com/marketplace/pp?sku=aw0evgkw8e5c1q413zgy5pjce
    status code: 401, request id: b97bed34-05aa-4be6-b105-29fe5845ddda

  on main.tf line 101, in resource "aws_instance" "rest_proxy":
 101: resource "aws_instance" "rest_proxy" {

Error: Error launching source instance: OptInRequired: In order to use this AWS Marketplace product you need to accept terms and subscribe. To do so please visit https://aws.amazon.com/marketplace/pp?sku=aw0evgkw8e5c1q413zgy5pjce
    status code: 401, request id: 0683381b-fa62-46b4-8d14-5e759bdc4513

  on main.tf line 125, in resource "aws_instance" "ksql":
 125: resource "aws_instance" "ksql" {

Error: Error launching source instance: OptInRequired: In order to use this AWS Marketplace product you need to accept terms and subscribe. To do so please visit https://aws.amazon.com/marketplace/pp?sku=aw0evgkw8e5c1q413zgy5pjce
    status code: 401, request id: d384b97e-2e1d-43d7-a059-6404f60d2bca

  on main.tf line 149, in resource "aws_instance" "control_center":
 149: resource "aws_instance" "control_center" {

Error: Error authorizing security group ingress rules: InvalidGroup.NotFound: You have specified two resources that belong to different networks.
    status code: 400, request id: aea1ae7e-3c21-405c-a9c7-f53bcf3e6c28

  on security_groups.tf line 85, in resource "aws_security_group" "kafka":
  85: resource "aws_security_group" "kafka" {

Error: Error authorizing security group ingress rules: InvalidGroup.NotFound: You have specified two resources that belong to different networks.
    status code: 400, request id: 5fe8be1f-4643-4076-a8f5-9d07aec76a16

  on security_groups.tf line 132, in resource "aws_security_group" "mds":
 132: resource "aws_security_group" "mds" {

Error: Error authorizing security group rule type ingress: InvalidGroup.NotFound: You have specified two resources that belong to different networks.
    status code: 400, request id: a002225d-60e9-4428-91a3-d9da7c84aaaa

  on security_groups.tf line 258, in resource "aws_security_group_rule" "schema_registry-ksql":
 258: resource "aws_security_group_rule" "schema_registry-ksql" {

Error: Error authorizing security group rule type ingress: InvalidGroup.NotFound: You have specified two resources that belong to different networks.
    status code: 400, request id: 79b3f7eb-76ab-415b-92aa-5b8e9b3775d4

  on security_groups.tf line 267, in resource "aws_security_group_rule" "schema_registry-rest_proxy":
 267: resource "aws_security_group_rule" "schema_registry-rest_proxy" {

Error: Error authorizing security group rule type ingress: InvalidGroup.NotFound: You have specified two resources that belong to different networks.
    status code: 400, request id: 3e1c324c-b3d2-40de-9fe9-fcd276ae136d

  on security_groups.tf line 276, in resource "aws_security_group_rule" "schema_registry-control_center":
 276: resource "aws_security_group_rule" "schema_registry-control_center" {
domenicbove commented 3 years ago

Fun! This is why we shouldn't use terraform, with containers theres a much higher success chance. Did you change the subnet? @amitkgupta

amitkgupta commented 3 years ago

half the errors were due to RHEL subscription, but the error message had a link to https://aws.amazon.com/marketplace/pp?sku=aw0evgkw8e5c1q413zgy5pjce. I went there and subscribed, so those errors aren't showing up anymore. But I think the demo should use an OS that doesn't require a subscription. Can it just use Ubuntu?

amitkgupta commented 3 years ago

Regarding the other errors, I have:

$ env | grep TF_VAR
TF_VAR_vpc_id=vpc-0d8e044df62b1969c
TF_VAR_subnet=subnet-0788ed210d1ce22d5
TF_VAR_ssh_key_public_path=~/.ssh/id_rsa_confluent.pub

And as you can see here, this should all be right:

Screen Shot 2021-04-08 at 3 40 50 PM

This looks fishy:

Screen Shot 2021-04-08 at 3 42 54 PM
domenicbove commented 3 years ago

I don't see TF_VAR_ami in those env vars. if you are switching up the vpc+subnet then the ami must go in step.

Also it looks like you are running into a naming collision situation. I would terraform destroy and then set those unique identifiers and reapply.

Regarding centos v ubuntu, the centos ami i use is usually fine. I suspect certain ubuntu or debian or rhel amis could run into the same subscription issue, idk the best way around it bc i cannot control what ami a user chooses. WDYT?

amitkgupta commented 3 years ago

I don't see TF_VAR_ami in those env vars. if you are switching up the vpc+subnet then the ami must go in step.

In the README, it only says:

If subnet outside of us-west-2 set below ami variable to centos image within your region

I'm in us-west-2, so I didn't set that AMI. Everything worked once I subscribed. Isn't there just some vanilla Ubuntu/Deb/maybe-CentOS AMI that can be used? I've never had to subscribe before using Terraform, but I've never used RHEL/CentOS via TF. Note Red Hat recently did some interesting things around CentOS licensing so that might be part of why.

amitkgupta commented 3 years ago

Ran the tower job, failed because all the hosts are unreachable.

PLAY RECAP *********************************************************************
ip-10-0-0-101.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-12.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-16.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-251.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-27.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-28.us-west-2.compute.internal : ok=5    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-29.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-49.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-62.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-86.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
ip-10-0-0-9.us-west-2.compute.internal : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   

Looks like the previous steps that created the hosts.yml created one with the internal addresses from the EC2 instances, so definitely not going to work with Tower running in Docker on my laptop.

Here's the generated hosts YAML:

all:
  vars:
    ansible_become: true

    ssl_enabled: true

    kafka_broker_custom_listeners:
      external:
        name: EXTERNAL
        port: 9093

zookeeper:
  hosts:
    ip-10-0-0-86.us-west-2.compute.internal:
      ansible_host:
    ip-10-0-0-28.us-west-2.compute.internal:
      ansible_host:
    ip-10-0-0-29.us-west-2.compute.internal:
      ansible_host:

kafka_broker:
  hosts:
    ip-10-0-0-251.us-west-2.compute.internal:
      ansible_host:
      mds_advertised_listener_hostname:
      kafka_broker_custom_listeners:
        external:
          hostname:
    ip-10-0-0-62.us-west-2.compute.internal:
      ansible_host:
      mds_advertised_listener_hostname:
      kafka_broker_custom_listeners:
        external:
          hostname:
    ip-10-0-0-27.us-west-2.compute.internal:
      ansible_host:
      mds_advertised_listener_hostname:
      kafka_broker_custom_listeners:
        external:
          hostname:

schema_registry:
  hosts:
    ip-10-0-0-12.us-west-2.compute.internal:
      ansible_host:

kafka_rest:
  hosts:
    ip-10-0-0-49.us-west-2.compute.internal:
      ansible_host:

kafka_connect:
  hosts:
    ip-10-0-0-101.us-west-2.compute.internal:
      ansible_host:

ksql:
  hosts:
    ip-10-0-0-9.us-west-2.compute.internal:
      ansible_host:

control_center:
  hosts:
    ip-10-0-0-16.us-west-2.compute.internal:
      ansible_host:

Looks like there's no public DNS address for any of the hosts. I can confirm that's the case here:

Screen Shot 2021-04-08 at 4 28 11 PM
domenicbove commented 3 years ago

@amitkgupta I'm confused why there is no public DNS for your aws hosts? I thought that was standard. You are correct this whole thing functions with the public DNS. The inventory file needs it. Here's the template:

zookeeper:
  hosts:%{ for host in aws_instance.zookeeper.* }
    ${host.private_dns}:
      ansible_host: ${host.public_dns}%{ endfor }

Could this help: https://stackoverflow.com/questions/20941704/ec2-instance-has-no-public-dns

All this aws/terraform struggling makes me question this whole demo depending on AWS. The purpose of the demo is not even aws its tower...

amitkgupta commented 3 years ago

Weird thing was, when I launched the VPC through the wizard in the console, I chose public VPC with 1 public subnet. But none of the instances have public IPs or DNS addresses.

My VPC does say:

DNS hostnames
Enabled

Maybe terraform has to explicitly set associate_public_ip_address to true: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance#associate_public_ip_address

domenicbove commented 3 years ago

@amitkgupta Good thinking! I just add associate_public_ip_address = true to all the ec2 objects. On my side I still get public hostnames. Can you

make destroy-infra
git pull origin tower-demo
make create-infra

And then check hosts.yml

Also now that you are updating the ssh key path, you will need to update this line in the ansible-tower/awx/create_tower_job.sh file:

echo "________Create Machine Credential from SSH Key________"
awx credentials create --credential_type 'Machine' \
    --name 'AWS Key' --organization Default \
    --inputs '{"username": "centos", "ssh_key_data": "@'${HOME}'/.ssh/id_rsa_confluent"}'

Thats assuming the private key is at: ~/.ssh/id_rsa_confluent

I plan to update the scripts to create ssh keys for the users, so this is just for the time being.

amitkgupta commented 3 years ago

@domenicbove could the terraform create the VPC as well? Just eliminate one more step for the user and one more thing that could go wrong. If not, then we probably need to be more prescriptive about the VPC, e.g. we have to tell people it has to support public addressing, etc.

domenicbove commented 3 years ago

@amitkgupta Regarding having terraform create a VPC, that's a great way to make sure everything is created correctly. Unfortunately my personal account with confluent doesn't have VPC creation capabilities. Or at least it did not when i joined Confluent... I think it hits a VPC limit error.

Anyhow, I thought it was a pretty common practice to expect someone to have a vpc and subnet.

domenicbove commented 3 years ago

@rmoff Great suggestions on the readme, just updated both readmes

@amitkgupta - I just updated the demo code to create an ssh key for you, so you wont need to set that ssh key path var. Also, theres another demo in this repo- ccloud-cube-demo which sets up the vpc and all networking bits. Do you think I should match that? I'm leaning toward yes.

amitkgupta commented 3 years ago

theres another demo in this repo- ccloud-cube-demo which sets up the vpc and all networking bits. Do you think I should match that? I'm leaning toward yes.

@domenicbove good question. I think we definitely shouldn't leave the reader to guess what a correct VPC setup is. IMO we should either automate it in the TF, or be more explicit about what needs to be true about the VPC (in case the user is bringing their own), or both -- where "both" could mean that the TF does it by default but optionally supports the reader passing in their VPC info for the pre-existing VPC.

I prefer having the TF just create the VPC personally, but I wouldn't be surprised to find people in similar situations where they're not allowed to do things that create new VPCs and have to use something already given to them. I'm curious what feedback we've gotten on ccloud-cube-demo -- have lots of readers complained that it creates a VPC for them? If not, then yeah, it probably makes sense to follow whatever precedent is set there.

I also think it's okay to pick whatever option you find reasonable for now, the demo can evolve later if it needs to. I was able to successfully run through end to end so I think most of the major kinks are gone, especially if you're handling SSH key stuff now.

domenicbove commented 3 years ago

@amitkgupta I'm glad to hear you got it working! What did you have to do to get your VPC to allow public DNS names? was it the associate_public_ip_address change that I made, or something you did?

amitkgupta commented 3 years ago

Your associate_public_ip_address change definitely seemed necessary, not sure if it was sufficient. Maybe that change + something I did was what was needed to make it work. I created a pretty vanilla VPC, it's possible if I picked different options/configurations then it wouldn't have worked, even with your associate_public_ip_address change. Here's what I did:

Screen Shot 2021-04-15 at 10 11 50 AM Screen Shot 2021-04-15 at 10 12 00 AM Screen Shot 2021-04-15 at 10 12 19 AM
domenicbove commented 3 years ago

@amitkgupta So I consulted with @nerdynick, and she's under the opinion that lots of orgs have their own ways of creating VPCs and just writing code that can deploy in theirs is a good idea. SO I think I won't add the networking terraform bits, but I've updated the readme to say:

## Prerequisistes:
- An AWS VPC with `Enabled DNS hostnames` set to true created
- An AWS Subnet, preferably in us-west-2, created

I think the Enabled DNS hostnames and associate_public_ip_address should cover your DNS name issue (hopefully). And ya now the scripts create ssh keys for you so there's also no need to even think about keys