NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.13k forks source link

Feature request: ssm-agent installed by default on Amazon Images #222637

Closed dylanmtaylor closed 7 months ago

dylanmtaylor commented 1 year ago

Feature Request

I believe that Nix should ship its EC2 images with Amazon Systems Manager Agent out of the box by default. All Amazon-provided AMIs including Amazon Linux and Ubuntu as well as their Windows images come with this agent which allows you to log into the instance without exposing port 22 or using an SSH key. I would argue that this is something that users might expect in AWS from an image and be surprised that it is not present.

This would make deploying and administering Nix on AWS easier and more secure for those familiar with this method of connecting, and it could be disabled via an Nix configuration if it is not desired on the instance.

Relevant Links

arianvp commented 1 year ago

I agree. Wanna send a PR to enable it? I can review it.

dylanmtaylor commented 1 year ago

Where can I make the change? Is there a declarative list of packages that get added to the images?

arianvp commented 1 year ago

It's here:

https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/virtualisation/amazon-image.nix

AleXoundOS commented 7 months ago

Btw, does SSM agent work correctly inside NixOS? It seems at startup SSM tries to run some tests and bootstrap, but fails. Excerpt from /var/log/amazon/ssm/amazon-ssm-agent.log:

"runtimeStatus": {                                                                                                                                                                                                                                                                           
  "aws:runShellScript": {                                                                                                                                                                                                                                                                    
    "status": "Failed",                                                                                                                                                                                                                                                                      
    "code": 127,                                                                                                                                                                                                                                                                             
    "name": "aws:runShellScript",                                                                                                                                                                                                                                                            
    "output": "\n----------ERROR-------\nsh: line 1: /var/lib/amazon/ssm/i-0a19eaa44cdc5988d/document/orchestration/5fd87533-dbde-4f6f-b55e-688a9883eb32/awsrunShellScript/0.awsrunShellScript/_script.sh: cannot execute: required file not found\nfailed to run commands: exit status 127",
    "startDateTime": "2024-03-16T06:50:13.719Z",                                                                                                                                                                                                                                             
    "endDateTime": "2024-03-16T06:50:13.804Z",                                                                                                                                                                                                                                               
    "outputS3BucketName": "imagepipeline-logs",                                                                                                                                                                                                                                              
    "outputS3KeyPrefix": "imagepipelineprefix/nixos-rebuild/1.0.0/1/wf-75bb3cd2-783c-4a85-971e-3eb40187d962/ApplyBuildComponents/5fd87533-dbde-4f6f-b55e-688a9883eb32/i-0a19eaa44cdc5988d/awsrunShellScript",                                                                                
    "stepName": "0.awsrunShellScript",                                                                                                                                                                                                                                                       
    "standardOutput": "",                                                                                                                                                                                                                                                                    
    "standardError": "sh: line 1: /var/lib/amazon/ssm/i-0a19eaa44cdc5988d/document/orchestration/5fd87533-dbde-4f6f-b55e-688a9883eb32/awsrunShellScript/0.awsrunShellScript/_script.sh: cannot execute: required file not found\nfailed to run commands: exit status 127"                    
  }                                                                                                                                                                                                                                                                                          
}                                                                                                                                                                                                                                                                                            

Here is the bash script itself, which AWS SSM uploads to agent: script.sh.txt.

It tries to install AWSTOE, which is required for EC2 Image Builder. Maybe it's not mandatory for SSM agent to operate in other use cases. Apparently, the failure prevents from running Image Builder pipeline in my case.

arianvp commented 7 months ago

Does the agent crash? Or keep going?

I've had it working on 23.05 to log in to machines as an alternative to ssh at work. But didn't really look at the logs

We should see if we can convince it not to run this step.

I don't think AWS image builder is gonna work well with nixos anyway.

arianvp commented 7 months ago

There's a PR up for this. https://github.com/NixOS/nixpkgs/pull/294493

AleXoundOS commented 7 months ago

There's a PR up for this: #294493

It's not enough for AWS Image Builder, which utilizes AWS SSM agent.

@arianvp, it crashes on the first line because of #!/bin/bash. Ok, we can solve overcome it with ln -s /run/current-system/sw/bin/bash /bin/bash. Further it requires cloud-init executable. Ok, I solved it with a combination of services.cloud-init.enable = true and cloud-init system package (however, cloud-config.service and apply-ec2-data.service definitely overlap). Then it downloads /tmp/imagebuilder/TaskOrchestratorAndExecutor/bootstrap.sh executable, which exits with error:

# /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe
bash: /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe: cannot execute: required file not found

Though, its library dependencies are satisfied:

# ldd /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe
        linux-vdso.so.1 (0x00007ffec3559000)
        libresolv.so.2 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libresolv.so.2 (0x00007f3e0a473000)
        libpthread.so.0 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libpthread.so.0 (0x00007f3e0a46e000)
        libc.so.6 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libc.so.6 (0x00007f3e0a285000)
        /lib64/ld-linux-x86-64.so.2 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib64/ld-linux-x86-64.so.2 (0x00007f3e0a486000)
AleXoundOS commented 7 months ago

Forgot to check, obviously it has incompatible interpreter:

# file /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe
/tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, Go BuildID=a9xxwgAIGQObloxPpOa4/77w2Nk7IBZnprXj7Q5RZ/4sY0LutJK2aXl4ayaFQZ/pduPLBCeO2XUlIRdj-OZ, stripped

autoPatchelf awstoe fixed this. Now awstoe runs (manually):

# ./awstoe 
AWS Task Orchestrator and Executor (AWSTOE) is an application that allows users to run operations on their respective operating system through TOE documents.

Usage:
  awstoe [flags]
  awstoe [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  help        Help about any command
  run         Execute TOE document.
  validate    Validate TOE document syntax and run sanity check.

Flags:
  -h, --help      help for awstoe
  -v, --version   Displays executable version

Use "awstoe [command] --help" for more information about a command.

It seems we can skip installation of TOE if ${WORKING_DIR}/imagebuilder/TaskOrchestratorAndExecutor/awstoe already exists.

IMAGE_BUILDER_DIR="${WORKING_DIR}/imagebuilder"
TOE_DIR="${IMAGE_BUILDER_DIR}/TaskOrchestratorAndExecutor"
function package_exists() {
    if [ $1 == "TOE" ] ; then
        if [ -f "${TOE_DIR}/awstoe" ]; then return 0
        else return 1
        fi
    else
        $(type "$1" > /dev/null 2>&1 )
        return $?
    fi
}
function bootstrap() {
    mkdir -p ${BOOTSTRAP_DIR}
    if ! package_exists TOE ; then
        install_package TOE
    fi
}

WORKING_DIR can be specified in parameters of Image Recipe. But it's not clear what's the point in our case, because it must be writable.

AleXoundOS commented 7 months ago

Assuming Amazon gives instructions on how to install AWSTOE manually, maybe there are other checks before the script above gets uploaded by SSM.

arianvp commented 7 months ago

Amazon Image Builder support seems like a separate issue.

The fact that it uses a bunch of imperative SSM commands to do image builds seems directly in contradiction with how nix stuff works.

I don't think I will plan to support it any time soon. Just build your image with Nix and upload it with nix run github:NixOS/amis#upload-ami

AleXoundOS commented 7 months ago

@arianvp, agree. AWSTOE downloads and executes arbitrary binaries, which have to be patched for Nix (or having some sort of nix-ld onboard).

And generally yes, builders like AWSTOE really contradict with what Nix can do more powerfully.


Influenced by your advice to use another method, I've come up with an EC2 user_data bash script, which brings flake.nix, flake.lock, etc. via bash heredoc (i.e. EOF stuff) and runs nixos-rebuild in the end. For offline environment --option substituters "s3://BUCKET_NAME?region=REGION&trusted=true" is used. Binary cache gets built as an S3 asset (with LocalBundling where nix build is called) and deployed to S3 bucket using BucketDeployment. Fully orchestrated with AWS CDK. So, EC2 instance gets updated in-place. It works.

arianvp commented 7 months ago

This almost is the same setup that I do. Except that I just pass in a nix store path so that the instance doesn't need to have access to GitHub go evaluate the flake

https://github.com/arianvp/nixos-village/blob/411ebd86eed7cbca31ff23b0aa92b02f311b864d/deploy/asg.tf#L50

AleXoundOS commented 7 months ago

I just pass in a nix store path so that the instance doesn't need to have access to GitHub go evaluate the flake

I bundle nixpkgs source nix store path required for flake evaluation with a number of packages needed for nixos-rebuild into binary cache, so that further invocations of nixos-rebuild over ssh work too for cases of quick emergency changes in-place (without outbound Internet access). However, I was considering your no-eval variant too, except considering nix-store --export & nix-store --import to pass a single file over S3 instead of a binary cache. Since, my each deployment contains only a single EC2 instance, there is no binary cache reuse benefit and deployment of a single-file CDK S3 Asset would be quicker than CDK BucketDeployment (which spawns additional Lambda for zip unpacking with caveats...). But I stumbled across 2 issues:

  1. nix-store --export cannot run inside a Nix derivation (whereas nix-store --dump can, thus possible to build binary caches reproducibly by nix build)
  2. currently, there is no AWS CLI binary (awscli2 package) available in amazon-init, so no CDK upstream convenience like aws_s3_download_command in user_data

I'm sorry for off-topics.

arianvp commented 7 months ago

I will create a Matrix channel where we can discuss AWS + NixOS issues! I will close this topic for now as the PR was merged

arianvp commented 7 months ago

https://matrix.to/#/#aws:nixos.org