clearlinux / micro-config-drive

An alternative and small cloud-init implementation in C
Other
46 stars 17 forks source link

ucd-aws ignores SSH key that has no \r (CR) symbol in the end #39

Closed kzamkov closed 4 years ago

kzamkov commented 5 years ago

If public SSH key does not contain carriage return symbol in the end (which is the case in some environments), it will be ignored and user will not be able to access the initial SSH login.

https://github.com/clearlinux/micro-config-drive/blob/d2451d23ecb04ed93a75ee68c109782d82124fae/src/ucd-aws.c#L1

ahkok commented 5 years ago

Sorry for not spotting this bug report earlier!

Yes, this would totally be a bug and we can likely easily fixed this. Having a missing \n here is fine, we should just add it in the code. I'll work on a fix.

ahkok commented 5 years ago

I think we corrected this issue in 2c7059fa5d55ee071e8870146f3cbfebaeaede8b and earlier commits, but, we haven't released this yet because it's not yet tested.

If you are capable of testing this, let me know or give it a try. Otherwise, we'll wait before testing is done before releasing.

kzamkov commented 5 years ago

Thanks for the update - need to check if my customer is capable of testing. I will get back to you next week.

kzamkov commented 5 years ago

When is release planned? I doubt it would be possible for me to test.

ahkok commented 5 years ago

I'm rolling out some testing as we speak, hopefully we'll have this validated this week,

ahkok commented 5 years ago

Please test v43

bfqrst commented 4 years ago

Assuming the bug is squashed, which I'm not sure it was confirmed yet - is there any chance this will make it into the next AWS marketplace release? I really need this in order to be able to bake those Packer images. That said, I'm happy to help if there's something I can test.

ahkok commented 4 years ago

v43 has this fix, so, this went live a while ago already as far as I know.

bfqrst commented 4 years ago

I just grabbed the vanilla 31640 image, repackaged it and deployed it via AWS console. Is still get the message that the server refused my key. Here's what I ended up doing: during provisioning I created another user and added him to the wheelnopw group in order to be able to inspect the system. I managed to log in via secondary user and password. Turns out the clear user has the .ssh dir and the authorized_keys file. But the file only contains the temporary ssh key that Hashicorp Packer used at compile time! There's also a switch in Packer to purge the temporary key just before finalizing the AMI. In that case the authorized_keys remains empty!

Bottom line: Either way, the key one selects as the last step when launching a custom AMI via AWS console doesn't get populated as the new instance comes up. Question is now, is this related to this bug, or are we talking something entirely different.

Let me know if I can help!

Cheers Ralph

ahkok commented 4 years ago

@bfqrst that's good info. It does suggest that somewhere in the process a key properly gets provisioned onto the system. Can you include a copy of the files in /var/lib/cloud that you get after booting the instance? This should give us some evidence as to what the fetcher thought of it.

Another thing to try is to manually do an HTTP request, from the new instance, for http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key and http://169.254.169.254/latest/user-data to verify the data being sent to the instance is correct.

bfqrst commented 4 years ago

Sorry for being quiet the last couple of days, work caught up with me...

Here we go:

  1. curl-ing http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key gives me the correct key that in theory needs to be written to the authorized_keys file. So this looks right!
  2. curl-ing http://169.254.169.254/latest/user-data gives me a 404 HTML page (that holds true for a non-Packer-off-the-shelf-AWS-Marketplace-Clear instance as well). Appears to be legit, because I didn't pass any user_data.
<?xml version="1.0" encoding="iso-8859-1"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>
  1. /var/lib/cloud/ has a file called aws_user_data containing the Packer temporary key that gets created during Packer AMI compile time
    #cloud-config
    users:
    - name: clear
    groups: wheelnopw
    ssh_authorized_keys:
    - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCggOwOaDi2iyhOre2yY9B21pnxS7xKDBmsKWSf+xe26WiZqe0y8ZXZgeM+lQxnNSPZPWEO9uR51CsDdqTCQ4LNaKzElLhpZaVs+G5M6lErjpuT4RQCLsJm8IkAGFYp2NYLUNdfnMOyvq5Z2Eqjf+D0klPLN7OIWoCYlrXyh81YZawH+upQmBuQ4MI5VInZYGB+xTQPGa83PCTFqAa9kSEtydgMeoaqBHtInryLQQbKFtwiIpdHbejth+U/gy998yZ1MUljmAIhwtjVWHoo2iQeO2xOYiBJ+Oc+z97X4Gdr+2WouTuycjj+tLxi8k24Kcf7ino1OaRw7KdWJpxRYFdn packer_5df4f2c6-4bc9-7f20-b33d-edeeac29cddb

So if I compare that to a non-Packer-off-the-shelf-AWS-Marketplace-Clear instance, the key in the authorized_key file is as expected the same key that is present in aws_user_data for the user clear.

Now, I can only wild guess here... Does the presence of the temporary Packer key in aws_user_data confuse ucd?

EDIT 1: typos and wording

ahkok commented 4 years ago

there's a lot here:

  1. curl-ing |http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key| gives me the correct key that in theory needs to be written to the authorized_keys file. So this looks right!

this is fetched by the helper, for sure.

  1. curl-ing |http://169.254.169.254/latest/user-data| in turn, gives me a 404 HTML page

This file may be absent in later boot sequences. IOW only at first boot may it exist.

Did you pass a cloud-config to the AMI instance from the AWS web pages or awscli?

  1. /var/lib/cloud/ has a file called aws_user_data containing the packer temporary key that gets created during Packer AMI compile time

If this file exists, the helper doesn't run (a condition prevents it). So the helper must have run?

That suggests that the data read from server contained the bad key.

|#cloud-config users: - name: clear groups: wheelnopw ssh_authorized_keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCggOwOaDi2iyhOre2yY9B21pnxS7xKDBmsKWSf+xe26WiZqe0y8ZXZgeM+lQxnNSPZPWEO9uR51CsDdqTCQ4LNaKzElLhpZaVs+G5M6lErjpuT4RQCLsJm8IkAGFYp2NYLUNdfnMOyvq5Z2Eqjf+D0klPLN7OIWoCYlrXyh81YZawH+upQmBuQ4MI5VInZYGB+xTQPGa83PCTFqAa9kSEtydgMeoaqBHtInryLQQbKFtwiIpdHbejth+U/gy998yZ1MUljmAIhwtjVWHoo2iQeO2xOYiBJ+Oc+z97X4Gdr+2WouTuycjj+tLxi8k24Kcf7ino1OaRw7KdWJpxRYFdn packer_5df4f2c6-4bc9-7f20-b33d-edeeac29cddb |

So if I compare that to a non Packer off the shelf Marketplace Clear instance, the key in the |authorized_key| file is as expected the same key that is present in |aws_user_data| file for the user |clear|.

Now, I can only wild guess here... Does the presence of the temporary Packer key in |aws_user_data| confuse ucd?

I'm so confused, and I have no idea what packer is or what it is doing. Does packer create the aws_user_data file? That would be bad, and the cause of all of this.

ahkok commented 4 years ago

I'm closing this issue. The original issue had nothing to do with hashicorp packer, and is, as far as I can see, resolved.

If hashicorp packer remains an issue, it's most likely due to hashicopr packer putting the wrong SSH keys into the userdata file that is passed in to the VM, since it appears that once the VM boots, it sees the hashicorp packer key, and not the custom one. That's too late for this project to fix anything - all the bad stuff has happened before micro-config-drive even runs. You're welcome to open a new bug for it, but this issue is unrelated and should be closed.