awslabs / landing-zone-accelerator-on-aws

Deploy a multi-account cloud foundation to support highly-regulated workloads and complex compliance requirements.
https://aws.amazon.com/solutions/implementations/landing-zone-accelerator-on-aws/
Apache License 2.0
532 stars 424 forks source link

Palo Alto NGFW EC2 Firewall customizations-config.yml associated key pair fails when authenticating via SSH #90

Closed josh-romme closed 1 year ago

josh-romme commented 1 year ago

Describe the bug Deployment of Palo Alto EC2 instance with associated key pair defined in customization-config.yml - after deployment the SSH key is not accepted by the Palo Alto EC2 instance.

To Reproduce

  1. Use customization-config.yml file to deploy Palo Alto NGFW EC2 instance specifying the already created key pair.
  2. Attempt to log in via SSH using key file after Palo Alto NGFW EC2 is deployed.

Expected behavior The Palo Alto NGFW EC2 instance is successfully deployed, but when attempting to log in via SSH for the first time the SSH key is not accepted. The keypair is shown correctly in the console, but you will be unable to log in via SSH because key is not accepted.

homeRegion: &HOME_REGION us-east-1

  |   |   |     |   |   | firewalls:   |   |   | instances:   |   |   | - name: palo-alto1-vm   |   |   | launchTemplate:   |   |   | name: firewall-paloalto1-lt   |   |   | blockDeviceMappings:   |   |   | - deviceName: /dev/xvda   |   |   | ebs:   |   |   | deleteOnTermination: true   |   |   | encrypted: true   |   |   | volumeSize: 60   |   |   | securityGroups: []   |   |   | enforceImdsv2: true   |   |   | keyPair: paloalto1   |   |   | iamInstanceProfile: PaloAltoUsers   |   |   | imageId: ami-0593adddd233d41bc   |   |   | instanceType: c5.2xlarge

Screenshots If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context Attempted to deploy Palo Alto NGFW's multiple times and recreated key pairs multiple times, always same result. Manually deployed Palo Alto NGFW EC2 with existing key pair that didn't work with LZA deployment and it worked when EC2 is manually deployed. Also deployed Palo Alto NGFW using Terraform with same key pair and was able to successfully authenticate via SSH.

Associating the key pair with in the customizations-config.yml file seems very straight forward, but for some reason the key pair just does not work after EC2 firewall is deployed.

Additional Note: The screenshot example for the launch template shows 'keyName:' which is incorrect. The documentation states it is 'keyPair:' which is the correct parameter, so the screenshot is incorrect.

https://awslabs.github.io/landing-zone-accelerator-on-aws/classes/_aws_accelerator_config.LaunchTemplateConfig.html

firegrass commented 1 year ago

I appear to be seeing this same issue with a Palo Alto firewall. I have also tried recreating keys.

firegrass commented 1 year ago

I have overcome this issue. It is not related to LZA. You must ensure that the device bootstrapping process does not fail. Also you must use a RSA key pair.

awsclemj commented 1 year ago

Hello @josh-romme and @firegrass, and thank you for reporting this issue. I can confirm that I'm able to replicate this with a BYOL version of PA-VM PanOS v 10.x. To clarify the behavior I'm seeing -- I can reach a login prompt of the LZA-deployed instance but it is requesting password authentication. Launching an instance directly from the management console yielded the same result initially, but after waiting 15 minutes or so it eventually allowed me to log in with public key auth.

I am still investigating the root cause of the issue. The only real difference in LZA deployment I am seeing currently is that we create an EC2 launch template and deploy the EC2 instance using that template rather than directly using the instance properties. I don't believe this in itself should lead to different launch behavior -- I am looking into differences in settings between my test instances now.

awsclemj commented 1 year ago

@firegrass thank you for the follow-up. Can you confirm the version of PanOS you're using as well as any troubleshooting steps you took? How did you identify if the firewall bootstrap failed? I am testing with a base AMI and am not applying any userdata in my tests.

awsclemj commented 1 year ago

I managed to work around the issue with the following steps:

  1. Launch the marketplace AMI from the management console
  2. Allow that instance to complete bootstrapping (i.e. you are able to log in via public key auth)
  3. Create a custom AMI of that instance after it has completed bootstrapping/initial configuration
  4. Launch the custom AMI via LZA

I am working internally to determine if there are avenues to launching directly from the marketplace AMI, but in any case I hope these steps help unblock you, @josh-romme. Thanks again for reporting this issue!

josh-romme commented 1 year ago

@awsclemj - we are not using bootstrapping in our deployment and it is using the pay as you go AMI. I waited over 24 hrs after deploying and still was not able to log in via SSH. This is with PAN-OS v10.1.9

Your workaround seems to make sense, but it does add a number of manual steps. Have you been able to determine if there is something causing the issue? Currently we had to fall back on using Terraform to deploy the Palo Alto NGFW's so that we could keep moving forward.

firegrass commented 1 year ago

@firegrass thank you for the follow-up. Can you confirm the version of PanOS you're using as well as any troubleshooting steps you took? How did you identify if the firewall bootstrap failed? I am testing with a base AMI and am not applying any userdata in my tests.

Sorry for the delay replying.

My LZA config looks like this

firewalls:
  instances:
    - name: pa-firewall-a-10-2-3-rev6
      launchTemplate:
        name: pa-firewall-lt-a-10-2-3-rev6
        keyPair: firewall
        securityGroups: []
        blockDeviceMappings:
          - deviceName: /dev/xvda
            ebs:
              deleteOnTermination: true
              encrypted: true
              volumeSize: 60
        enforceImdsv2: false
        # TODO iamInstanceProfile: firewall-profile
        imageId: ${ACCEL_LOOKUP::ImageId:/aws/service/marketplace/prod-hhtxhxwx3jg6k/pan-os-10.2.3}
        instanceType: m5.xlarge
        networkInterfaces:
          - deleteOnTermination: true
            description: Data interface
            deviceIndex: 0
            groups:
              - firewall-data
            subnetId: Network-Inspection-A
          - deleteOnTermination: true
            description: Management interface
            deviceIndex: 1
            groups:
              - firewall-management
            subnetId: Network-Inspection-Management-A
            associateElasticIp: true
        userData: launch-templates/firewall-userdata.sh
      vpc: Network-Inspection

And userdata

type=dhcp-client
dhcp-accept-server-hostname=yes
dhcp-accept-server-domain=yes
op-command-modes=mgmt-interface-swap
plugin-op-commands=aws-gwlb-inspect:enable
awsclemj commented 1 year ago

Hello @josh-romme

After some additional testing, the key here is configuring enforceImdsv2: false under the launchTemplate. The PAN instance does not seem to ever complete bootstrapping if IMDSv2 is enabled. Note that we enable this by default as a best practice, so the explicit false value must be there.

Thank you @firegrass for the added detail in your specific configuration. This helped me to determine the issue!

Please let us know if there is anything else we can do to support on this issue, otherwise we will close it if there is not additional response after some time. Thank you!