oem-ami: problem hitting metadata service in openstack

crawford commented 9 years ago

Issue by cbuben Wednesday Aug 28, 2013 at 21:45 GMT Originally opened as https://github.com/coreos/coreos-overlay/issues/184

When using the official production 54.0.0 openstack image, unable to log into instance with pubkey auth, as ssh key retrieval logic in oem-ami can't hit the metadata service.

Environment: openstack grizzly, quantum networking with openvswitch, provider network (i.e. bridged to real vlan on network), "all-in-one" install with compute, control, and network bits on same box.

Upon further examination: our setup might be broken, but instances can't reach the metadata service when routing to the default gateway. Our instances must have a zeroconf route to the link-local network (169.254.0.0/16)

When I cracked the image and added a zeroconf route to the oem-ami run, things worked properly.

I'm not sure there is action to be taken, this serves as more of a problem report.

crawford commented 9 years ago

Comment by polvi Thursday Aug 29, 2013 at 02:05 GMT

Chris, what was the line you added to your script? Thank you.

On Wed, Aug 28, 2013 at 2:45 PM, Chris Buben notifications@github.comwrote:

When using the official production 54.0.0 openstack image, unable to log into instance with pubkey auth, as ssh key retrieval logic in oem-ami can't hit the metadata service.

Environment: openstack grizzly, quantum networking with openvswitch, provider network (i.e. bridged to real vlan on network), "all-in-one" install with compute, control, and network bits on same box.

Upon further examination: our setup might be broken, but instances can't reach the metadata service when routing to the default gateway. Our instances must have a zeroconf route to the link-local network (169.254.0.0/16)

When I cracked the image and added a zeroconf route to the oem-ami run, things worked properly.

I'm not sure there is action to be taken, this serves as more of a problem report.

— Reply to this email directly or view it on GitHubhttps://github.com/coreos/coreos-overlay/issues/184 .

crawford commented 9 years ago

Comment by cbuben Thursday Aug 29, 2013 at 12:23 GMT

@polvi Alex, it's in cbuben/coreos-overlay@4684ee4. However, I'm not certain this is generally applicable, check out the caveats in the commit. Thanks!

crawford commented 9 years ago

Comment by philips Sunday Apr 06, 2014 at 23:14 GMT

@cbuben Our images have changed significantly since this bug was filed. Have you figured out a solution that can fix this? How do other vms in your environment deal with this?

crawford commented 9 years ago

Comment by cbuben Friday May 09, 2014 at 20:08 GMT

@philips Sorry for the long delay.

I haven't looked at this specific issue since it was opened, and I haven't attempted to follow relevant changes in CoreOS either, but I do have more information / background on interacting with the metadata service in openstack in various network configurations.

Here is the basic issue: in openstack there are different network types:

openstack regular networks, where the L3 gateway for an instance is part of the cloud infrastructure itself. In this case, the gateway can intercept/DNAT traffic to the backend metadata service implementation.
openstack provider networks where the L3 gateway for an instance is not a piece of the cloud infrastructure. In this case, the gateway cannot generally intercept/DNAT traffic to the backend metadata service implementation.

The scenario in which I reported the issue was with an instance on a provider network.

When I opened this issue, CoreOS 54.0.0 didn't have a link-local zeroconf route on eth0 (169.254.0.0/16), meaning that traffic destined for the metadata service would hit the default route, and get sent to the default gateway. This scenario works fine on an openstack internal network, as the gateway is logically part of the cloud infrastructure and can direct the traffic to the backend metadata service implementation. This scenario also works in EC2 much for the same reason (last time I checked, at least). However, this scenario does not work in openstack external/provider networks, where the default gateway is usually a hardware device.

The core issue is - without a link-local zeroconf route on eth0, the assumption is being made that the default gateway will somehow know how to direct traffic destined for 169.254.169.254 to the backend metadata service implementation. This assumption is true for openstack internal networks and EC2. But if the gateway isn't part of the cloud infrastructure itself, like with openstack provider networks, the assumption is false.

Now let's assume a link-local zeroconf route is present on eth0. In this case, the host will arp for 169.254.169.254 out eth0. This works in all 3 scenarios I'm aware of: openstack provider networks, openstack internal networks, and EC2.

What follows here is just opinion: 169.254.0.0/16 is designated by RFC 3927 as reserved for link local addresses. So it seems like in any cloud implementation with an EC2-like metadata service on 169.254.169.254, the cloud infrastructure should be obligated to make that address available in a link-local manner on every ethernet segment that guests are directly attached to. Reliance on the gateway somehow directing the traffic to 169.254.169.254 seems like a hack in general and counter to the intent of RFC 3927.

As far as the best strategy here, seems like the most general/resilient way to do this may be to, after acquiring a DHCP lease, perform an arping for 169.254.169.254 out eth0, and if it succeeds, add a route for 169.254.0.0/16 on eth0.

crawford commented 9 years ago

Comment by cbuben Tuesday May 13, 2014 at 17:33 GMT

I had a chance to catch up with CoreOS a bit. Glad to see the focus on cloud-init format bootstrapping, looks like a sound/solid/generic design. Glad to see support for multiple data sources, most notably config-drive.

I spun up the beta AMI in our openstack cluster.

So first, for the metadata access, I reconfirmed the previous findings. In a private network where openstack implements the gateway and can redirect to the metadata service, everything works fine. In a provider network, where an instance must have a zeroconf route to reach the metadata service, the instance can't reach the metadata service, no userdata gets retrieved, no ssh pubkeys get added, no access, etc etc.

Now it gets interesting. We have configdrive enabled in our openstack deployment, so both datasources are available. And since configdrive is independent of network access, I was hoping it would provide a convenient way around the metadata service access issue. However, another catch...

It seems like configdrive access is waiting for ec2 metadata to complete successfully. If I spin an instance in a provider network (where metadata service doesn't work), pass in userdata to add ssh keys, the keys never get added, and I can never log in.

I'm not an expert in the systemd initialization process, and I haven't done any deep digging, so I can't give an accurate summary of what's really happening on the instance.

If I catch the instance on vnc console and add coreos.autologin to the kernel command line, I can see that /media/configdrive is mounted properly, and /usr/bin/coreos-cloudinit --from-file=/media/configdrive/openstack/latest/user_data, works perfectly, so it seems like the cloudinit configdrive unit is never getting started.

Possibly related to coreos/coreos-cloudinit/issues/86?

crawford commented 9 years ago

Comment by cbuben Tuesday May 13, 2014 at 18:03 GMT

https://gist.github.com/cbuben/572b5124b04196051c87

Above: process table for host on provider network immediately after first boot (caught at syslinux, boot_kernel coreos.autologin)

So looking in more detail, I would guess we're bogged down on earlier init that's also directly dependent on the metadata service.

Since there are various different clients (not only coreos-cloud-init) that are directly dependent on the metadata service, seems like you still might want to do the test + route add.

If you can do something conceptually similar to this early on in the system initialization process, you're good:

arping -c 1 -w 1 169.254.169.254 && ip route add 169.254.0.0/16 dev ens3 scope link

BTW, on the instance above that was wedged, when I executed the above on console, the init process immediately ran to completion, and all was good.

crawford commented 9 years ago

Comment by philips Friday May 16, 2014 at 01:44 GMT

@cbuben It seems there are a couple of issues here.

1) On the configdrive issue:

Can you capture the output of the journal for that boot?

journalctl --boot

and also the status of systemd?

systemctl status

2) What are the linux systems on your network using to configure the zeroconf addresses? avahi?

Thanks!

crawford commented 9 years ago

Comment by cbuben Tuesday May 27, 2014 at 15:21 GMT

@philips

1) Log info today if I get a chance.

2) No avahi. In our custom CentOS 6.x OpenStack AMIs, we ensure the zeroconf route is present via a dhclient interface up hook script.

$ sudo cat /etc/dhcp/dhclient-eth0-up-hooks
#!/bin/bash
ip route add 169.254.0.0/16 dev $interface metric $((1000 + $(cat /sys/class/net/$interface/ifindex))) scope link

crawford commented 9 years ago

Comment by cbuben Wednesday May 28, 2014 at 21:25 GMT

@philips

https://gist.github.com/cbuben/6b7291bfbc19fce5b5fa

Booted instance on provider network (i.e. 169.254.169.254 not reachable via gateway, only via link local), caught syslinux on vnc console, boot_kernel coreos.autologin, set core password, sshed in, captured output.

crawford commented 9 years ago

Comment by philips Thursday Jun 19, 2014 at 23:53 GMT

/cc @marineam @eyakubovich

crawford commented 9 years ago

Comment by eyakubovich Sunday Jun 29, 2014 at 21:05 GMT

Added support for setting routes from DHCP server to networkd: http://cgit.freedesktop.org/systemd/systemd/commit/?id=e1ea665edac17d75fce01b72dadfa3211b60df2c

This should make it possible to add classless-static-route option to dnsmasq like described by @brianredbeard in https://github.com/coreos/coreos-overlay/issues/491#issuecomment-40015697 to enable the meta-data service to be reachable via a specific gateway.

crawford commented 9 years ago

Comment by marineam Friday Jul 25, 2014 at 02:25 GMT

As long as the DHCP server is providing the right route this should be fixed in the current alpha, 386.1.0

But... it isn't clear from this bug if the issue was that the DHCP server wasn't including the route option or the option wasn't being processed by networkd since I think this bug predates networkd. If DHCP isn't providing that route I don't know what we would do to solve this, it seems odd to me for CoreOS to assume that it should always enable IPv4LL on its interfaces.

We do support config drive these days so that is also an option for OpenStack environments with tricky networks.

Closing but please reopen if this is still an issue.

crawford commented 9 years ago

Comment by skeenan947 Tuesday Dec 02, 2014 at 22:41 GMT

Re-opening this... I work in the same environment as cbuben. My nodes (running build 509) aren't spinning on boot, and are pulling keys from the configdrive, but the zeroconf 169.254.169.254 address isn't being added on nodes, which makes $private_ipv4 and $public_ipv4 get interpreted to a blank value.

coreos / bugs

oem-ami: problem hitting metadata service in openstack #524