cisagov / cool-assessment-terraform

Terraform to deploy an assessment environment to the COOL
Creative Commons Zero v1.0 Universal
13 stars 4 forks source link

Guacamole instance inaccessible with no Kali and no allowed inbound ports #83

Open dav3r opened 3 years ago

dav3r commented 3 years ago

🐛 Bug Report

When an assessment environment is created without a Kali instance and with no operations_subnet_inbound_tcp_ports_allowed and no operations_subnet_inbound_udp_ports_allowed, the resulting Guacamole instance starts up in a messed-up state and is not accessible via SSH.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

The Guacamole instance should start up correctly and be accessible by SSH (via SSM).

Any helpful log output

Here is the section of the log that shows that the Docker bridge fails to come up, which prevents the Guacamole composition from starting successfully:

[   73.135967] cloud-init[579]: Cloud-init v. 20.2 running 'modules:config' at Tue, 03 Nov 2020 21:34:11 +0000. Up 71.06 seconds.
[   79.777026] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   79.809832] Bridge firewalling registered
[   80.781409] Initializing XFRM netlink socket
[   80.872609] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[   83.996722] IPv6: ADDRCONF(NETDEV_UP): br-4516774645af: link is not ready
[  100.326197] IPv6: ADDRCONF(NETDEV_UP): br-f26879dd3117: link is not ready
[  116.666879] IPv6: ADDRCONF(NETDEV_UP): br-fee1ac643615: link is not ready
[  132.850902] IPv6: ADDRCONF(NETDEV_UP): br-1375b6d8322d: link is not ready
[  149.434985] IPv6: ADDRCONF(NETDEV_UP): br-c351ac01eb7e: link is not ready
[  165.578743] IPv6: ADDRCONF(NETDEV_UP): br-724ec8dcc51f: link is not ready
[  182.214864] IPv6: ADDRCONF(NETDEV_UP): br-16a24fb24d6a: link is not ready
[  198.574740] IPv6: ADDRCONF(NETDEV_UP): br-a70d667a1eb2: link is not ready
[  214.986790] IPv6: ADDRCONF(NETDEV_UP): br-e21ddb74d51b: link is not ready
[  231.342529] IPv6: ADDRCONF(NETDEV_UP): br-f1cc3b89432f: link is not ready
[  247.735274] IPv6: ADDRCONF(NETDEV_UP): br-a95cdf69980f: link is not ready

It is unclear why this would happen based on the lack of a Kali instance or allowing inbound ports. Our current suspicion is that it has something to do with the cloud-init scripts that generate the Guacamole connections:

We think that those scripts may be failing in such a way that cloud-init gets hung up and cannot continue, but more research is needed.

It's also worth noting that this message in the log (bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.) sounds bad, but we have confirmed that it shows up on instances where Guacamole starts up successfully, so we don't believe it is related to this issue.

jsf9k commented 3 years ago

It's also worth noting that this message in the log (bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.) sounds bad, but we have confirmed that it shows up on instances where Guacamole starts up successfully, so we don't believe it is related to this issue.

I think this is just a helpful message that warns you that you need to load the br_netfilter kernel module in order to do any packet filtering on a bridge interface. See, for example, here.

dav3r commented 3 years ago

Observation: When a Kali instance exists in the tfvars file and is removed (e.g. "kali" : 1 --> "kali" : 0), running terraform apply will remove the Kali instance and related infrastructure, but it will not touch the Guacamole instance. However, another terraform apply (after the previous one completes, and with no other changes to the tfvars file) will cause the Guacamole instance (and related infrastructure) to be replaced.

During testing of this case, a functioning Guacamole instance was replaced by another functioning instance, though in this test case the inbound TCP/UDP allowed ports lists were not empty.

jsf9k commented 3 years ago

When a Kali instance exists in the tfvars file and is removed (e.g. "kali" : 1 --> "kali" : 0), running terraform apply will remove the Kali instance and related infrastructure, but it will not touch the Guacamole instance. However, another terraform apply (after the previous one completes, and with no other changes to the tfvars file) will cause the Guacamole instance (and related infrastructure) to be replaced.

I have seen this sort of hysteresis in Terraform before, most recently in cisagov/openvpn-server-tf-module#45.

dav3r commented 3 years ago

Further testing has revealed that port 443 must be included in the list of operations_subnet_inbound_tcp_ports_allowed in order to avoid the failing Guacamole instance.

This was verified by deleting all resources in an environment (env1-staging) and running a terraform apply with the following Terraform variables:

operations_instance_counts = {
  "debiandesktop" : 1,
  "kali" : 0
}
operations_subnet_inbound_tcp_ports_allowed = ["443"]
operations_subnet_inbound_udp_ports_allowed = []

After the apply finished, I confirmed that the Guacamole instance had started up correctly and was accessible via SSH.

Next task: Figure out why exactly port 443 matters here. This is very strange because the Guacamole instance does not live in the Operations subnet, so I wouldn't expect the inbound Operations ports to make any difference to Guacamole.