cisagov / kali-packer

This project can be used to create AMIs based on Kali Linux, a penetration testing distribution.
Creative Commons Zero v1.0 Universal
14 stars 6 forks source link

Kali instances inaccessible #127

Closed jsf9k closed 1 year ago

jsf9k commented 1 year ago

🐛 Summary

The Kali instances launched from the 0.5.8 version of the Kali AMI (both staging and production) are inaccessible via SSM Session Manager or Guacamole.

To reproduce

  1. Launch an EC2 instance from the most recent Kali AMI.
  2. Verify that it is inaccessible via SSM Session Manager or Guacamole.

Expected behavior

I would expect the Kali instances to be accessible via both Guacamole and SSM Session Manager.

Any helpful log output or screenshots

Paste the results here:

❯ AWS_SHARED_CREDENTIALS_FILE=~/.aws/staging_credentials AWS_DEFAULT_REGION=us-east-1 aws --profile cool-env6-startstopssmsession ssm start-session --target=i-0cc652fa8d12c9f66 --document=SSM-SessionManagerRunShell

An error occurred (TargetNotConnected) when calling the StartSession operation: i-0cc652fa8d12c9f66 is not connected.
jsf9k commented 1 year ago

I verified that the latest staging AMI has the same issue. I tried building a new staging AMI in case an upstream change that has since been fixed was the cause, but the new AMI has the same issue.

Another thing to try is to see whether increasing the size of the AMI root disk has any effect. It is possible that there is enough space to create the AMI, but the few bytes that are left over are not sufficient for cloud-init to get to the point where it increases the size of the root disk.

jsf9k commented 1 year ago

Another thing to try is to see whether increasing the size of the AMI root disk has any effect. It is possible that there is enough space to create the AMI, but the few bytes that are left over are not sufficient for cloud-init to get to the point where it increases the size of the root disk.

I built a staging AMI with this change and found that it fails in the same way.

jsf9k commented 1 year ago

I tried a few other desperate things in #128 without success.

Looking at the console logs from a Kali instance that failed to launch, I was finally able to see that the AMI was failing when attempting to resize the root file system due to the absence of the flock executable. After more detective work I found this bug report, which led me to attempt to manually patch the update-initramfs hook script and regenerate the initramfs when building the Kali AMI. I am building a new AMI with those changes now.

Here is the output from the console log that clued me into the issue:

[    3.509276] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Quota mode: none.
done.
Begin: Running /scripts/local-bottom ...
[    4.085835] EXT4-fs (nvme0n1p1): unmounting filesystem.
GROWROOT: WARNING: resize failed: failed [flock:127] flock -x 9
/sbin/growpart: line 714: flock: not found
FAILED: Error while obtaining exclusive lock on /dev/nvme0n1
jsf9k commented 1 year ago

The flock changes fixed one problem, but now we are hitting a different bug:

[   23.516084] EXT4-fs (nvme0n1p1): resized filesystem to 33521659
[   23.567118] EXT4-fs (nvme0n1p1): Invalid checksum for backup superblock 32768
[   23.567118] 
[   23.573529] EXT4-fs error (device nvme0n1p1) in ext4_update_backup_sb:174: Filesystem failed CRC
[   23.580063] Aborting journal on device nvme0n1p1-8.
[   23.583549] EXT4-fs (nvme0n1p1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 131660, error -30)
[   23.584871] EXT4-fs (nvme0n1p1): Remounting filesystem read-only
[   23.596101] EXT4-fs error (device nvme0n1p1) in ext4_update_backup_sb:174: Journal has aborted
FAILED Failed to start Grow File System on /
See 'systemctl status systemd-growfs@-.service' for details.

I believe this is due to a kernel bug. See here and here.

jsf9k commented 1 year ago

I tried holding the Linux kernel without success.

jsf9k commented 1 year ago

It turns out that dpkg/apt and aptitude use different methods of holding packages, and they do not respect each other. After some modifications I was able to successfully hold the Linux kernel, and I verified that the resulting AMI is accessible via SSM Session Manager and Guacamole.