clearlinux / micro-config-drive

An alternative and small cloud-init implementation in C
Other
45 stars 17 forks source link

ucd-aws fails on hosts with no public key #55

Open gmarkey opened 2 years ago

gmarkey commented 2 years ago

On EC2 instances without a public key, ucd-data-fetch aws outputs parse_headers(): Success and exits with RC=1. Looking at the output of strace, it appears that it considers the missing key to be fatal and doesn't query other metadata or userdata endpoints.

The way that the SSH user is added is also haphazard; it tried to concatenate this baked-in section of configuration with whatever it finds in userdata, making it impossible to use standard bash scripts rather than cloudinit format.


4224  execve("/usr/bin/ucd-data-fetch", ["ucd-data-fetch", "aws"], 0x7ffc7caae0e0 /* 42 vars */) = 0
4224  brk(NULL)                         = 0x561ff128b000
4224  arch_prctl(0x3001 /* ARCH_??? */, 0x7ffef7588130) = -1 EINVAL (Invalid argument)
4224  access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
4224  openat(AT_FDCWD, "/var/cache/ldconfig/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
4224  newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=92290, ...}, AT_EMPTY_PATH) = 0
4224  mmap(NULL, 92290, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f16c6c4e000
4224  close(3)                          = 0
4224  openat(AT_FDCWD, "/usr/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
4224  read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\337\2\0\0\0\0\0@\0\0\0\0\0\0\0\360\320!\0\0\0\0\0\0\0\0\0@\08\0\16\0@\0@\0?\0\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\20\3\0\0\0\0\0\0\20\3\0\0\0\0\0\0\10\0\0\0\0\0\0\0\3\0\0\0\4\0\0\0\200\236\36\0\0\0\0\0\200\236\36\0\0\0\0\0\200\236\36\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0\1\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0000\277\2\0\0\0\0\0000\277\2\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\5\0\0\0\0\300\2\0\0\0\0\0\0\300\2\0\0\0\0\0\0\300\2\0\0\0\0\0\344m\31\0\0\0\0\0\344m\31\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\4\0\0\0\0000\34\0\0\0\0\0\0000\34\0\0\0\0\0\0000\34\0\0\0\0\00028\5\0\0\0\0\00028\5\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\6\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\220O\0\0\0\0\0\0000%\1\0\0\0\0\0\0\20\0\0\0\0\0\0\2\0\0\0\6\0\0\0\340\231!\0\0\0\0\0\340\251!\0\0\0\0\0\340\251!\0\0\0\0\0\340\1\0\0\0\0\0\0\340\1\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\240\3\0\0\0\0\0\0\240\3\0\0\0\0\0\0\240\3\0\0\0\0\0\0D\0\0\0\0\0\0\0D\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\7\0\0\0\4\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\20\0\0\0\0\0\0\0\220\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0S\345td\4\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0\240\236\36\0\0\0\0\0\240\236\36\0\0\0\0\0\240\236\36\0\0\0\0\0\304p\0\0\0\0\0\0\304p\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\200*\0\0\0\0\0\0", 832) = 832
4224  pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\20\3\0\0\0\0\0\0\20\3\0\0\0\0\0\0\10\0\0\0\0\0\0\0\3\0\0\0\4\0\0\0\200\236\36\0\0\0\0\0\200\236\36\0\0\0\0\0\200\236\36\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0\1\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0000\277\2\0\0\0\0\0000\277\2\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\5\0\0\0\0\300\2\0\0\0\0\0\0\300\2\0\0\0\0\0\0\300\2\0\0\0\0\0\344m\31\0\0\0\0\0\344m\31\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\4\0\0\0\0000\34\0\0\0\0\0\0000\34\0\0\0\0\0\0000\34\0\0\0\0\00028\5\0\0\0\0\00028\5\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\6\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\220O\0\0\0\0\0\0000%\1\0\0\0\0\0\0\20\0\0\0\0\0\0\2\0\0\0\6\0\0\0\340\231!\0\0\0\0\0\340\251!\0\0\0\0\0\340\251!\0\0\0\0\0\340\1\0\0\0\0\0\0\340\1\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\240\3\0\0\0\0\0\0\240\3\0\0\0\0\0\0\240\3\0\0\0\0\0\0D\0\0\0\0\0\0\0D\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\7\0\0\0\4\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\20\0\0\0\0\0\0\0\220\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0S\345td\4\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0\240\236\36\0\0\0\0\0\240\236\36\0\0\0\0\0\240\236\36\0\0\0\0\0\304p\0\0\0\0\0\0\304p\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\200*\0\0\0\0\0\0\200*\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 784, 64) = 784
4224  pread64(3, "\4\0\0\0@\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0\2\200\0\300\4\0\0\0\1\0\0\0\0\0\0\0\1\0\1\300\4\0\0\0;\10\0\0\0\0\0\0\2\0\1\300\4\0\0\0\17\0\0\0\0\0\0\0", 80, 848) = 80
4224  pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\334p~m}\237\234Y\336\372p\340\355\21\234,\356\36\305\324\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\3\0\0\0\n\0\0\0\0\0\0\0", 68, 928) = 68
4224  newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2220272, ...}, AT_EMPTY_PATH) = 0
4224  mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f16c6c4c000
4224  pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\20\3\0\0\0\0\0\0\20\3\0\0\0\0\0\0\10\0\0\0\0\0\0\0\3\0\0\0\4\0\0\0\200\236\36\0\0\0\0\0\200\236\36\0\0\0\0\0\200\236\36\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0\1\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0000\277\2\0\0\0\0\0000\277\2\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\5\0\0\0\0\300\2\0\0\0\0\0\0\300\2\0\0\0\0\0\0\300\2\0\0\0\0\0\344m\31\0\0\0\0\0\344m\31\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\4\0\0\0\0000\34\0\0\0\0\0\0000\34\0\0\0\0\0\0000\34\0\0\0\0\00028\5\0\0\0\0\00028\5\0\0\0\0\0\0\20\0\0\0\0\0\0\1\0\0\0\6\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\220O\0\0\0\0\0\0000%\1\0\0\0\0\0\0\20\0\0\0\0\0\0\2\0\0\0\6\0\0\0\340\231!\0\0\0\0\0\340\251!\0\0\0\0\0\340\251!\0\0\0\0\0\340\1\0\0\0\0\0\0\340\1\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\240\3\0\0\0\0\0\0\240\3\0\0\0\0\0\0\240\3\0\0\0\0\0\0D\0\0\0\0\0\0\0D\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\7\0\0\0\4\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\20\0\0\0\0\0\0\0\220\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0S\345td\4\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\3\0\0\0\0\0\0P\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0\240\236\36\0\0\0\0\0\240\236\36\0\0\0\0\0\240\236\36\0\0\0\0\0\304p\0\0\0\0\0\0\304p\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\200u!\0\0\0\0\0\200\205!\0\0\0\0\0\200\205!\0\0\0\0\0\200*\0\0\0\0\0\0\200*\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 784, 64) = 784
4224  mmap(NULL, 2271920, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f16c6a21000
4224  mprotect(0x7f16c6a4d000, 2015232, PROT_NONE) = 0
4224  mmap(0x7f16c6a4d000, 1667072, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2c000) = 0x7f16c6a4d000
4224  mmap(0x7f16c6be4000, 344064, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f16c6be4000
4224  mmap(0x7f16c6c39000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x217000) = 0x7f16c6c39000
4224  mmap(0x7f16c6c3f000, 51888, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f16c6c3f000
4224  close(3)                          = 0
4224  mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f16c6a1e000
4224  arch_prctl(ARCH_SET_FS, 0x7f16c6a1e740) = 0
4224  set_tid_address(0x7f16c6a1ea10)   = 4224
4224  set_robust_list(0x7f16c6a1ea20, 24) = 0
4224  rseq(0x7f16c6a1f0e0, 0x20, 0, 0x53053053) = 0
4224  mprotect(0x7f16c6c39000, 12288, PROT_READ) = 0
4224  mprotect(0x561fef45b000, 4096, PROT_READ) = 0
4224  mprotect(0x7f16c6c9d000, 8192, PROT_READ) = 0
4224  prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
4224  munmap(0x7f16c6c4e000, 92290)     = 0
4224  socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
4224  connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("169.254.169.254")}, 16) = 0
4224  getrandom("\xbe\x31\x60\x14\x75\x29\x41\x9a", 8, GRND_NONBLOCK) = 8
4224  brk(NULL)                         = 0x561ff128b000
4224  brk(0x561ff12ac000)               = 0x561ff12ac000
4224  write(3, "GET /latest/meta-data/public-keys/0/openssh-key HTTP/1.1\r\nhost: 169.254.169.254\r\nConnection: keep-alive\r\n\r\n", 107) = 107
4224  fcntl(3, F_GETFL)                 = 0x2 (flags O_RDWR)
4224  newfstatat(3, "", {st_mode=S_IFSOCK|0777, st_size=0, ...}, AT_EMPTY_PATH) = 0
4224  read(3, "HTTP/1.1 404 Not Found\r\nContent-Type: text/html\r\nContent-Length: 339\r\nDate: Thu, 17 Mar 2022 22:41:47 GMT\r\nServer: EC2ws\r\nConnection: close\r\n\r\n<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\t \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>404 - Not Found</title>\n </head>\n <body>\n  <h1>404 - Not Found</h1>\n </body>\n</html>\n", 4096) = 482
4224  close(3)                          = 0
4224  dup(2)                            = 3
4224  fcntl(3, F_GETFL)                 = 0x402 (flags O_RDWR|O_APPEND)
4224  newfstatat(3, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0
4224  write(3, "parse_headers(): Success\n", 25) = 25
4224  close(3)                          = 0
4224  lseek(3, -458, SEEK_CUR)          = -1 EBADF (Bad file descriptor)
4224  exit_group(1)                     = ?
4224  +++ exited with 1 +++```
armenr commented 1 year ago

+1 ^^ This is happening to me as well.

The latest "best-practice" in AWS is to avoid SSH keys altogether, and instead simply assign an SSM Management role to the IAM Role of your EC2, and then access it through SSM Agent.

At minimum, this "requirement" of a public key should be documented, if not amended/fixed/patched.

For what it's worth, I lost 4 solid days of work and troubleshooting time, attempting to understand what the issue was.

bwarden commented 1 year ago

This will take some more thoughtful refactoring. It appears the meta-data API is returning HTTP 404 when SSH keys aren't configured. As this is technically an error code, and most likely the same one you'd get if we had a typo in the URI, we need to think more carefully about which errors should be fatal, instead of making a quick fix.

armenr commented 1 year ago

Great point! Have you had a chance to consider?

ahkok commented 1 year ago

From the recent issues posted and discussed, it's clear we may want to consider rewriting the fetcher tool (a significant undertaking).

That's not something we are going to do without having any idea whether it helps just a few people who may be off using alternative methods (e.g. creating custom images instead using other tools), or it's going to benefit e.g. hundreds of people. We have a lack of data to make this discussion easier right now.

We also need to look at SSM's APIs as well.

armenr commented 1 year ago

The consideration is much appreciated.

I've stopped relying on ucd altogether and am instead using an IMDS-V2 based implementation written in Go, with some custom scripting & systemd units. ucd is definitely faster, but being able to do the same thing in about the same time is - for most cases - good enough.

I also installed and set up cloud-init as part of a custom-baked ClearLinux AMI, just to see how it would perform. It wasn't too bad, either. Definitely slower, but at least it presents the "standard" userdata interface and behavior that everyone knows and expects from nearly every cloud OS on every major cloud.

I worry that you might not catch or find signal/data on how many people this might benefit (or how many people this is a "pain" for). That's because most operators who prospect or attempt to run a "new" distro/AMI in AWS - and run into problems like these - will simply revert back to something like AL2, Ubuntu, or a RHEL variant, without ever reporting it.

There are many, many aspects of ClearLinux which make it an excellent OS, but things like this are a deal-breaker/non-starter for a broad base of the decision-makers and builders that would be adopters and users of ClearLinux specifically in the Cloud segment, as a host OS (not as a container base OS).

Just some suggestions

I'd recommend at least documenting this caveat somewhere in the ClearLinux docs, so that others - like me and @gmarkey - don't end up losing time and sleep when trying to figure out the issue.

The caveat essentially being that you must configure your EC2 with an explicit SSH key, otherwise UserData breaks in AWS.

And I'd also maybe suggest that - while you wait for data to make your decision and contemplate whether the effort is worth it - that maybe some more explicit/obvious logging is added to ucd, since parse_headers(): Success is completely cryptic, and communicates nothing to the user about what's actually happening (or not working), and why.

qasim-nylas commented 1 year ago

Hi,

Just wanted to let you guys know, I spend 3 to 4 days just debugging this issue, and gave up. I tested all my config just now and found out this is the issue (back to clear linux) Can you please mentioned it in DOCS, if there is no key name it will fail.

Thanks