Closed dylanmtaylor closed 7 months ago
I agree. Wanna send a PR to enable it? I can review it.
Where can I make the change? Is there a declarative list of packages that get added to the images?
Btw, does SSM agent work correctly inside NixOS?
It seems at startup SSM tries to run some tests and bootstrap, but fails. Excerpt from /var/log/amazon/ssm/amazon-ssm-agent.log
:
"runtimeStatus": {
"aws:runShellScript": {
"status": "Failed",
"code": 127,
"name": "aws:runShellScript",
"output": "\n----------ERROR-------\nsh: line 1: /var/lib/amazon/ssm/i-0a19eaa44cdc5988d/document/orchestration/5fd87533-dbde-4f6f-b55e-688a9883eb32/awsrunShellScript/0.awsrunShellScript/_script.sh: cannot execute: required file not found\nfailed to run commands: exit status 127",
"startDateTime": "2024-03-16T06:50:13.719Z",
"endDateTime": "2024-03-16T06:50:13.804Z",
"outputS3BucketName": "imagepipeline-logs",
"outputS3KeyPrefix": "imagepipelineprefix/nixos-rebuild/1.0.0/1/wf-75bb3cd2-783c-4a85-971e-3eb40187d962/ApplyBuildComponents/5fd87533-dbde-4f6f-b55e-688a9883eb32/i-0a19eaa44cdc5988d/awsrunShellScript",
"stepName": "0.awsrunShellScript",
"standardOutput": "",
"standardError": "sh: line 1: /var/lib/amazon/ssm/i-0a19eaa44cdc5988d/document/orchestration/5fd87533-dbde-4f6f-b55e-688a9883eb32/awsrunShellScript/0.awsrunShellScript/_script.sh: cannot execute: required file not found\nfailed to run commands: exit status 127"
}
}
Here is the bash script itself, which AWS SSM uploads to agent: script.sh.txt.
It tries to install AWSTOE, which is required for EC2 Image Builder. Maybe it's not mandatory for SSM agent to operate in other use cases. Apparently, the failure prevents from running Image Builder pipeline in my case.
Does the agent crash? Or keep going?
I've had it working on 23.05 to log in to machines as an alternative to ssh at work. But didn't really look at the logs
We should see if we can convince it not to run this step.
I don't think AWS image builder is gonna work well with nixos anyway.
There's a PR up for this. https://github.com/NixOS/nixpkgs/pull/294493
There's a PR up for this: #294493
It's not enough for AWS Image Builder, which utilizes AWS SSM agent.
@arianvp, it crashes on the first line because of #!/bin/bash
. Ok, we can solve overcome it with ln -s /run/current-system/sw/bin/bash /bin/bash
. Further it requires cloud-init
executable. Ok, I solved it with a combination of services.cloud-init.enable = true
and cloud-init
system package (however, cloud-config.service
and apply-ec2-data.service
definitely overlap). Then it downloads /tmp/imagebuilder/TaskOrchestratorAndExecutor/bootstrap.sh
executable, which exits with error:
# /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe
bash: /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe: cannot execute: required file not found
Though, its library dependencies are satisfied:
# ldd /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe
linux-vdso.so.1 (0x00007ffec3559000)
libresolv.so.2 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libresolv.so.2 (0x00007f3e0a473000)
libpthread.so.0 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libpthread.so.0 (0x00007f3e0a46e000)
libc.so.6 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib/libc.so.6 (0x00007f3e0a285000)
/lib64/ld-linux-x86-64.so.2 => /nix/store/1zy01hjzwvvia6h9dq5xar88v77fgh9x-glibc-2.38-44/lib64/ld-linux-x86-64.so.2 (0x00007f3e0a486000)
Forgot to check, obviously it has incompatible interpreter:
# file /tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe
/tmp/imagebuilder/TaskOrchestratorAndExecutor/awstoe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, Go BuildID=a9xxwgAIGQObloxPpOa4/77w2Nk7IBZnprXj7Q5RZ/4sY0LutJK2aXl4ayaFQZ/pduPLBCeO2XUlIRdj-OZ, stripped
autoPatchelf awstoe
fixed this. Now awstoe
runs (manually):
# ./awstoe
AWS Task Orchestrator and Executor (AWSTOE) is an application that allows users to run operations on their respective operating system through TOE documents.
Usage:
awstoe [flags]
awstoe [command]
Available Commands:
completion Generate the autocompletion script for the specified shell
help Help about any command
run Execute TOE document.
validate Validate TOE document syntax and run sanity check.
Flags:
-h, --help help for awstoe
-v, --version Displays executable version
Use "awstoe [command] --help" for more information about a command.
It seems we can skip installation of TOE if ${WORKING_DIR}/imagebuilder/TaskOrchestratorAndExecutor/awstoe
already exists.
IMAGE_BUILDER_DIR="${WORKING_DIR}/imagebuilder"
TOE_DIR="${IMAGE_BUILDER_DIR}/TaskOrchestratorAndExecutor"
function package_exists() {
if [ $1 == "TOE" ] ; then
if [ -f "${TOE_DIR}/awstoe" ]; then return 0
else return 1
fi
else
$(type "$1" > /dev/null 2>&1 )
return $?
fi
}
function bootstrap() {
mkdir -p ${BOOTSTRAP_DIR}
if ! package_exists TOE ; then
install_package TOE
fi
}
WORKING_DIR
can be specified in parameters of Image Recipe. But it's not clear what's the point in our case, because it must be writable.
Assuming Amazon gives instructions on how to install AWSTOE manually, maybe there are other checks before the script above gets uploaded by SSM.
Amazon Image Builder support seems like a separate issue.
The fact that it uses a bunch of imperative SSM commands to do image builds seems directly in contradiction with how nix stuff works.
I don't think I will plan to support it any time soon. Just build your image with Nix and upload it with nix run github:NixOS/amis#upload-ami
@arianvp, agree. AWSTOE downloads and executes arbitrary binaries, which have to be patched for Nix (or having some sort of nix-ld
onboard).
And generally yes, builders like AWSTOE really contradict with what Nix can do more powerfully.
Influenced by your advice to use another method, I've come up with an EC2 user_data
bash script, which brings flake.nix
, flake.lock
, etc. via bash heredoc (i.e. EOF
stuff) and runs nixos-rebuild
in the end. For offline environment --option substituters "s3://BUCKET_NAME?region=REGION&trusted=true"
is used. Binary cache gets built as an S3 asset (with LocalBundling where nix build
is called) and deployed to S3 bucket using BucketDeployment. Fully orchestrated with AWS CDK. So, EC2 instance gets updated in-place. It works.
This almost is the same setup that I do. Except that I just pass in a nix store path so that the instance doesn't need to have access to GitHub go evaluate the flake
I just pass in a nix store path so that the instance doesn't need to have access to GitHub go evaluate the flake
I bundle nixpkgs source nix store path required for flake evaluation with a number of packages needed for nixos-rebuild
into binary cache, so that further invocations of nixos-rebuild
over ssh work too for cases of quick emergency changes in-place (without outbound Internet access). However, I was considering your no-eval variant too, except considering nix-store --export
& nix-store --import
to pass a single file over S3 instead of a binary cache. Since, my each deployment contains only a single EC2 instance, there is no binary cache reuse benefit and deployment of a single-file CDK S3 Asset would be quicker than CDK BucketDeployment (which spawns additional Lambda for zip unpacking with caveats...). But I stumbled across 2 issues:
nix-store --export
cannot run inside a Nix derivation (whereas nix-store --dump
can, thus possible to build binary caches reproducibly by nix build
)awscli2
package) available in amazon-init, so no CDK upstream convenience like aws_s3_download_command in user_dataI'm sorry for off-topics.
I will create a Matrix channel where we can discuss AWS + NixOS issues! I will close this topic for now as the PR was merged
Feature Request
I believe that Nix should ship its EC2 images with Amazon Systems Manager Agent out of the box by default. All Amazon-provided AMIs including Amazon Linux and Ubuntu as well as their Windows images come with this agent which allows you to log into the instance without exposing port 22 or using an SSH key. I would argue that this is something that users might expect in AWS from an image and be surprised that it is not present.
This would make deploying and administering Nix on AWS easier and more secure for those familiar with this method of connecting, and it could be disabled via an Nix configuration if it is not desired on the instance.
Relevant Links