HighOps / ansible_ec2_vpc_nat_asg

Ansible repo for creating a multi-az ec2 VPC with a NAT auto scaling group
MIT License
60 stars 11 forks source link

'get a list of public subnet-id,route-id maps' fails due to missing variable. #9

Open djcross opened 9 years ago

djcross commented 9 years ago

HI, I've picked this up after intending to try Ansible for repeatable AWS VPC setups for a while now. Very nice work. I am hitting a few issues however, and here is one.

I have successfully run through and created the VPC, and then when making a small change and re-running the playbook, I hit this.

$ ansible-playbook -v plays/operation/bootstrap_vpc.yml --extra-vars "env=rea_prod"
<snip>

TASK [create the private route tables] *****************************************
failed: [localhost] => (item={u'routes': [{u'dest': u'0.0.0.0/0', u'gateway_id': u'igw'}], u'resource_tags': {u'environment': u'production', u'Name': u'rea_prod/private_rtable_a'}, u'subnets': [u'rea_prod/application_subnet_a']}) => {"failed": true, "item": {"resource_tags": {"Name": "rea_prod/private_rtable_a", "environment": "production"}, "routes": [{"dest": "0.0.0.0/0", "gateway_id": "igw"}], "subnets": ["rea_prod/application_subnet_a"]}, "msg": "Unable to ensure routes for route table RouteTable:rtb-7a6ad41f, error: EC2ResponseError: 400 Bad Request\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Response><Errors><Error><Code>RouteAlreadyExists</Code><Message>The route identified by 0.0.0.0/0 already exists.</Message></Error></Errors><RequestID>c1234fc0-bf4a-4409-899f-2d24cd7e8620</RequestID></Response>"}
failed: [localhost] => (item={u'routes': [{u'dest': u'0.0.0.0/0', u'gateway_id': u'igw'}], u'resource_tags': {u'environment': u'production', u'Name': u'rea_prod/private_rtable_b'}, u'subnets': [u'rea_prod/application_subnet_b']}) => {"failed": true, "item": {"resource_tags": {"Name": "rea_prod/private_rtable_b", "environment": "production"}, "routes": [{"dest": "0.0.0.0/0", "gateway_id": "igw"}], "subnets": ["rea_prod/application_subnet_b"]}, "msg": "Unable to ensure routes for route table RouteTable:rtb-456ad420, error: EC2ResponseError: 400 Bad Request\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Response><Errors><Error><Code>RouteAlreadyExists</Code><Message>The route identified by 0.0.0.0/0 already exists.</Message></Error></Errors><RequestID>e3e85e52-fe02-424b-8cf1-db5afbdba147</RequestID></Response>"}
...ignoring

TASK [process security groups] *************************************************
ok: [localhost] => (item={u'rules': [{u'cidr_ip': u'10.49.21.0/24', u'proto': u'all'}], u'rules_egress': [{u'cidr_ip': u'0.0.0.0/0', u'proto': u'all'}], u'name': u'rea_prod/nat_security_group', u'description': u'allow outbound nat'}) => {"changed": false, "group_id": "sg-6885f20d", "item": {"description": "allow outbound nat", "name": "rea_prod/nat_security_group", "rules": [{"cidr_ip": "10.49.21.0/24", "proto": "all"}], "rules_egress": [{"cidr_ip": "0.0.0.0/0", "proto": "all"}]}}
ok: [localhost] => (item={u'rules': [{u'to_port': 22, u'from_port': 22, u'cidr_ip': u'59.101.127.161/32', u'proto': u'tcp'}, {u'to_port': -1, u'from_port': -1, u'cidr_ip': u'0.0.0.0/0', u'proto': u'icmp'}], u'rules_egress': [{u'cidr_ip': u'0.0.0.0/0', u'proto': u'all'}], u'name': u'rea_prod/bastion_security_group', u'description': u'access bastion, allow outbound nat'}) => {"changed": false, "group_id": "sg-6b85f20e", "item": {"description": "access bastion, allow outbound nat", "name": "rea_prod/bastion_security_group", "rules": [{"cidr_ip": "59.101.127.161/32", "from_port": 22, "proto": "tcp", "to_port": 22}, {"cidr_ip": "0.0.0.0/0", "from_port": -1, "proto": "icmp", "to_port": -1}], "rules_egress": [{"cidr_ip": "0.0.0.0/0", "proto": "all"}]}}

TASK [get a list of public subnet-id,route-id maps] ****************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'route_table_id'
fatal: [localhost]: FAILED! => {"failed": true, "stdout": ""}

PLAY RECAP *********************************************************************
localhost                  : ok=8    changed=1    unreachable=0    failed=1   

It seems that this is failing due to the registered variable 'ec2_vpc_route_table_private_out' not being set as the private route tables already exist and the task is skipped.

- name: create the private route tables
        ec2_vpc_route_table:
          region: "{{ region }}"
          resource_tags: "{{ item.resource_tags }}"
          routes: "{{ item.routes }}"
          subnets: "{{ item.subnets }}"
          vpc_id: "{{ ec2_vpc_net_out.vpc.id }}"
        with_items: vpc.route_tables.private
        register: ec2_vpc_route_table_private_out
        ignore_errors: yes

<snip>

      - name: get a list of public subnet-id,route-id maps
        set_fact:
          subnet_route_map: "{{ ec2_vpc_subnet_out.results | get_subnet_route_map(ec2_vpc_route_table_private_out.results) }}"
      - name: merge the eip allocated list with the subnet-id,route-id map list
        set_fact:
          subnet_route_map: "{{ nat_eipalloc_list | get_zip(subnet_route_map) }}"

I'm new to using these VPC modules, and registered variables. I'm wondering if I'm missing something here? I'm very curious as to how Ansible determines the difference between a change and an addition in this context, as I've also tried to alter a tag Key only to have a duplicate resource created, and am finding myself having to manually destroy the instances, and VPC etc in the console quite often to be able to run through the playbook again. Perhaps there's a way for an inventory to be built during the VPC bootstrap?

halberom commented 9 years ago

Hi, yes this is a known issue, which I've not had time to workaround/resolve.

The route table module forces a destination (e.g. 0.0.0.0/0 via igw). But the nat monitor script run on the instances change it, so a re-run will fail on that step, and won't return the id's. There's now a route table get facts module (merged in https://github.com/ansible/ansible-modules-extras/pull/778), which should allow handling that a bit cleaner - although I'm not sure it's returning the subnet information, so there would need to be a bit of associative magic.

I asked for the module to be modified to not force a destination, but I'm not sure that's been applied.

djcross commented 9 years ago

Thanks for the reply. I've just tried messing with the 'ec2_vpc_route_table_facts' but am getting constant issues trying to iterate through the filters.

I guess essentially what is needed is the lookup with an ignore_errors, and then runs through the private route table creation if ec2_vpc_route_table_private_out is not already set from the lookup. now, to get that lookup working.

Are you running this in any prod environments? Also, any other gotchas or known issues? It certainly is a lot nicer to work with than cloudformation templates, but it seems the ansible aws modules have a little while to go before this could support a production environment.

halberom commented 9 years ago

I have used it to setup several production environments, but as it's a bootstrap, I'm not re-running it. Subsequent actions are being performed elsewhere.

There are some additional things that could be added to clean it up, such as using block/rescue, and a bit more verification of variables.

But the main issue is the route table forcing a gateway on the destination. Taking another quick look, it seems the route table module doesn't actually force routes to be passed (says they're required in the doc section, but there's conditionals and the param isn't marked required). In which case the solution would be to not set any private routes and extend the nat monitor script to check and create the route itself - which should be pretty easy (and is on my todo list anyway).

ste-bah commented 8 years ago

Hi I was just wondering if you have managed to do any updates on this ?

halberom commented 8 years ago

Hello @ste-bah, I'm no longer with HighOps and have been busy on other things. I would suggest investigating Terraform for your more complex AWS infrastructure setup, as managing it through Ansible requires a lot more investment especially if you want the ability to tear-down your environments.