aws-quickstart / quickstart-ibm-liberty-eks

AWS Quick Start Team
Apache License 2.0
3 stars 6 forks source link

QS initial code #6

Closed git4rk closed 2 years ago

git4rk commented 2 years ago

Description of changes: Initial code drop for Liberty for new EKS (on new VPC) Main template: templates/ibm-liberty-new-eks.template.yaml

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

vsnyc commented 2 years ago

@git4rk - Only CloudFormation template files are allowed in templates directory. Could you move the other yaml files in templates directory to scripts/templates/ and change the access paths accordingly?

vsnyc commented 2 years ago

@git4rk - I'm looking at the failure reason for the status checks.

git4rk commented 2 years ago

@vsnyc Is there anything else I need to fix for all the checks to pass?

vsnyc commented 2 years ago

@git4rk - sorry for the delay, our CI is currently broken at the moment due to a conflict with a dependency update. I'll have an answer for you as soon as possible. Meanwhile, I'll also do a manual review tomorrow and share any feedback.

pbaity commented 2 years ago

@vsnyc We're still getting one linter finding here:

          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - eks:ListClusters
                  - eks:DescribeAddonVersions
                  - eks:RegisterCluster
                  - eks:CreateCluster
                Resource: '*'

The report says:

CFN Lint: * on Resource property is a bad idea

IAM policy should not allow * resource; This method in this in this policy support granular permissions

But it seems to me that the four methods/actions in the policy don't support granular permissions. When testing it out in the AWS IAM policy visual editor, I can't select granular resources:

image

Even if I try to narrow down the actions to EKS resources in the same account (I tried arn:aws:eks:*:<account_id>:* and arn:aws:eks:*:<account_id>:*/*), I get a warning that this is invalid and won't apply to any resources:

image

Is there a correct way to define granular permissions for these actions or is this in invalid finding?

git4rk commented 2 years ago

@vsnyc Can you please review Paul's comment (https://github.com/aws-quickstart/quickstart-ibm-liberty-eks/pull/6#issuecomment-1280850232) and the PR and let me know if you have any comments before merging. (@nathsen FYI).

vsnyc commented 2 years ago

@vsnyc Can you please review Paul's comment (#6 (comment)) and the PR and let me know if you have any comments before merging. (@nathsen FYI).

We found an issue with our custom lint rules (https://github.com/aws-quickstart/qs-cfn-lint-rules/issues/61) which was missing the eks:RegisterCluster method, thus causing this error. I've added an exception in commit to address this.

pbaity commented 2 years ago

@vsnyc There are a few linter findings in ibm-liberty-new-eks.template.yaml we have some questions about. For each question, I've added below a section showing the YAML in question, what the linter reported, and our comments/questions.


    ParameterLabels:
      AcceptLicense:
        default: Accept* (https://ibm.biz/was-license)
W9006 Parameter Label is not sentence case: {'biz/was-license)'}
templates/ibm-liberty-new-eks.template.yaml:38:9

W9006 Parameter Label contains spelling error(s): {'https'}
templates/ibm-liberty-new-eks.template.yaml:38:9

This is a URL for our license. We would like to keep it in the label to make the URL prominent, otherwise it seems to get lost in the description. Can we get an exception here?


  AppNamespace:
    default: EKS namespace to deploy the app

  ...

  AppNamespace:
    Type: String
    Default: "default"
    Description: Namespace in the EKS cluster where app will be deployed. This namespace is created if it does not exist.
W9006 Parameter Label contains spelling error(s): {'namespace'}
templates/ibm-liberty-new-eks.template.yaml:44:9

W9006 Parameter AppNamespace contains spelling error(s): {'namespace'}
templates/ibm-liberty-new-eks.template.yaml:89:5

This is an invalid finding. "Namespace" (one word) is the correct spelling. See https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/


      AdditionalEKSAdminArns:
        default: Additional EKS administrator ARNs (IAM users or roles)
W9006 Parameter Label is not sentence case: {'ARNs'}
templates/ibm-liberty-new-eks.template.yaml:62:9

This is an invalid finding. "ARN" is an abbreviation and is correctly capitalized, as seen is AWS's doc: https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html


  AppContainerImageURL:
    Type: String
    Description: >-
      Custom application container image. For example, icr.io/appcafe/open-liberty/samples/getting-started.
      This value is only required when "Deploy application"="Custom" is selected.
W9006 Parameter AppContainerImageURL Description is not sentence case: {'Custom', 'Deploy', 'io/appcafe/open-liberty/samples/getting-started'}
templates/ibm-liberty-new-eks.template.yaml:77:5

This description is referencing a container image location and another parameter. We'd like to keep it as-is.


  AppReplicas:
    Type: Number
    MinValue: 1
    Default: 2
    Description: Number of app replicas (minimum 1). Ignore this parameter if "Deploy application"=None.
  DeployApp:
    Type: String
    AllowedValues: [ "Sample", "Custom", None ]
    Default: "Sample"
    Description: Type of application to deploy. If "None", deploy no application. If "Sample" or "Custom", also see the "License" parameters.

  ...

  LicenseProductEntitlementSource:
    Type: String
    AllowedValues: ["Standalone", "IBM WebSphere Hybrid Edition", "IBM Cloud Pak for Applications", "IBM WebSphere Application Server Family Edition"]
    Description: Entitlement source for the product. Select "Standalone" if the entitlement source is a product that you selected in License "Edition".
W9006 Parameter AppReplicas Description is not sentence case: {'Deploy', 'None'}
templates/ibm-liberty-new-eks.template.yaml:94:5

W9006 Parameter DeployApp Description is not sentence case: {'Custom', 'License', 'Sample', 'None'}
templates/ibm-liberty-new-eks.template.yaml:99:5

W9006 Parameter LicenseProductEntitlementSource Description is not sentence case: {'Edition', 'License', 'Standalone'}
templates/ibm-liberty-new-eks.template.yaml:114:5

These are references to other parameters. We'd like to keep these as-is.


  AdditionalEKSAdminArns:
    Default: ""
    Description: >-
      (Optional) A comma-delimited list of IAM user or role Amazon Resource Names (ARNs)
      to be granted administrative access to the Amazon EKS cluster. For example, grant
      access to two users with their ARNs: "arn:aws:iam::012345678910:user/user1@example.com,
      arn:aws:iam::012345678910:user/user2@example.com".
W9006 Parameter AdditionalEKSAdminArns Description is not sentence case: {'Names', 'com,', 'com"', 'Resource', 'ARNs'}
templates/ibm-liberty-new-eks.template.yaml:160:5

W9006 Parameter AdditionalEKSAdminArns contains spelling error(s): {'aws', 'arn'}
templates/ibm-liberty-new-eks.template.yaml:160:5

These are invalid findings. The terms here are the exact same that AWS documentation uses: https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html

git4rk commented 2 years ago

@vsnyc We have updated the code based on your suggestions. The Linter is passing now. Please see Paul's comment for the some lint issues we could not resolve: https://github.com/aws-quickstart/quickstart-ibm-liberty-eks/pull/6#issuecomment-1284379840

If you are ok, can you please merge this PR. (@nathsen FYI)

git4rk commented 2 years ago

@vsnyc Thanks for the sample code. We have updated the code to create boot node and EKS in same VPC.

Please note that we now have two main templates:

We also have few questions:

  1. As per your sample code, the only allowed value for NumberOfAZs is 3. Why do we need to expose this as parameter?
    NumberOfAZs:
    AllowedValues:
      - '3'
    Default: '3'
  2. We're currently not validating that the customer selects 3 AZs for AvailabilityZones parameter. Should that be validated? How?
  3. Even though VPC has multiple availability zones, the boot node is only created in one AZ, right?

Please also take a look at https://github.com/aws-quickstart/quickstart-ibm-liberty-eks/pull/6#issuecomment-1285803286 for lint related comments.

Please review and merge the PR. If you are ok, we can address other comments in a different PR. (@nathsen FYI).

vsnyc commented 2 years ago

@git4rk - good questions. There is no way to validate/enforce 2 - that customer has selected at least 3 AZs and that's why we have typically exposed NumberOfAZs parameter to give visual cues to users assuming they read the descriptions to say that they must choose three AZs. If you prefer, we can eliminate NumberOfAZs parameters and have a description on AvailabilityZones parameter to state that they must choose at least three Availability Zones. Selecting more than 3 will not cause any errors but the additional AZs won't be used for deployment.

Yes, the boot node will be created in the public subnet 1.

pbaity commented 2 years ago

@vsnyc Is it possible to use the AWS::LanguageExentsions transform to "import" the Fn::Length function, so we can check that the length of the selections is exactly 3? Something like:

Transform: 'AWS::LanguageExtensions'
...
Rules:
  Require3AZs:
    Assertions:
      - Assert: !Equals 
          - Fn::Length:
              !Ref NumberOfAZs
          - 3
        AssertDescription: You must select exactly 3 availability zones.

My syntax might not be perfect there, but hopefully it conveys the idea.

vsnyc commented 2 years ago

@pbaity - I definitely learned something new, thanks for the suggestion. I did a quick try and couldn't get it to work, I believe it's because the "functions and attributes provided by the AWS::LanguageExtensions transform are only supported in the Resources, Conditions, and Outputs sections of a template." (see docs link). There are other ways of achieving this, e.g. a custom CloudFormation hook, a custom macro, custom resource implementation to do a pre-requisite check, etc. Developing a hook that does this check is probably the best suggestion, however ideally that is external to this solution. I would skip it for simplicity.

vsnyc commented 2 years ago

Also, replying generally to a prior comment on pending lint suggestion for language text, it's okay to not change code, some of it can be handled as exception added in metadata. I'll look at any obvious ones and fix it.

pbaity commented 2 years ago

Thanks @vsnyc ! We were also looking at how we can restrict some of the permissions of the IAM role we create for the EC2 instance profile. Our PR currently has this:

ManagedPolicyArns:
  - !Sub arn:${AWS::Partition}:iam::aws:policy/AmazonEC2FullAccess
  - !Sub arn:${AWS::Partition}:iam::aws:policy/AWSCloudFormationFullAccess
  - !Sub arn:${AWS::Partition}:iam::aws:policy/AmazonSSMManagedInstanceCore
  - !Sub arn:${AWS::Partition}:iam::aws:policy/AmazonS3FullAccess
  - !Sub arn:${AWS::Partition}:iam::aws:policy/CloudWatchFullAccess

We did some testing and were able to use different policies for S3 and CloudWatch with fewer permissions, so those last two policies have changed to:

  - !Sub arn:${AWS::Partition}:iam::aws:policy/AmazonS3ReadOnlyAccess
  - !Sub arn:${AWS::Partition}:iam::aws:policy/CloudWatchAgentServerPolicy

But the first two policies - AmazonEC2FullAccess and AWSCloudFormationFullAccess - we added because their listed in eksctl's official docs as the minimum IAM permissions needed for creating and managing EKS clusters. This not only allows us to create the cluster, but also permits the customer to use the EC2 instance as a jump box to manage their cluster with eksctl.

Is it okay to keep those two IAM policies?

vsnyc commented 2 years ago

Yes, this looks good.

vsnyc commented 2 years ago

/do-e2e-tests

aws-ia-ci[bot] commented 2 years ago

I'm creating a backend CI pipeline for this project. This is a one-time procedure but will add a delay of approximately 5 minutes.

git4rk commented 2 years ago

@vsnyc For AvailabilityZones params user needs to select 3 zones but some regions only display two zones and the template fails to launch. What should we do in that situation?

image

vsnyc commented 2 years ago

@git4rk - you can relax the number of AZs to be at least 2, that way it will work in all the Regions. You can change the default value to 2.

vsnyc commented 2 years ago

@git4rk - Just wanted to add that if you choose to allow two AZ deployments, you'll need to update the bootnode template to make private subnet 3 ID, public subnet 3 ID and Availability Zone 3 parameters optional.

git4rk commented 2 years ago

@vsnyc If we make 3rd AZ optional in the bootnode template, from the main template how can we check that customer has selected 3 and pass all three. Do you have a sample code for that?

From main template (templates/ibm-liberty-new-eks-with-app.template.yaml):

IBMLibertyBootNodeStack:
    Type: 'AWS::CloudFormation::Stack'
    Properties:
      TemplateURL: !Sub
        - https://${S3Bucket}.s3.${S3Region}.${AWS::URLSuffix}/${QSS3KeyPrefix}templates/ibm-liberty-ec2-bootnode.template.yaml
        - S3Bucket: !If [UsingDefaultBucket, !Sub 'aws-quickstart-${AWS::Region}', !Ref QSS3BucketName]
          S3Region: !If [UsingDefaultBucket, !Ref AWS::Region, !Ref QSS3BucketRegion]
      Parameters:
        RoleName: !GetAtt IBMLibertyIAMRoleStack.Outputs.RoleName
        BootNodeName: !Join ["-", [!Ref EKSClusterName, "bootnode"]]
        EKSClusterName: !Ref EKSClusterName
        PublicSubnet1ID: !GetAtt 'IBMLibertyVPCStack.Outputs.PublicSubnet1ID'
        PublicSubnet2ID: !GetAtt 'IBMLibertyVPCStack.Outputs.PublicSubnet2ID'
        PublicSubnet3ID: !GetAtt 'IBMLibertyVPCStack.Outputs.PublicSubnet3ID'
        AvailabilityZone1: !Select [0, !Ref AvailabilityZones]
        AvailabilityZone2: !Select [1, !Ref AvailabilityZones]
        AvailabilityZone3: !Select [2, !Ref AvailabilityZones]
        PrivateSubnet1ID: !GetAtt 'IBMLibertyVPCStack.Outputs.PrivateSubnet1AID'
        PrivateSubnet2ID: !GetAtt 'IBMLibertyVPCStack.Outputs.PrivateSubnet3AID'
        PrivateSubnet3ID: !GetAtt 'IBMLibertyVPCStack.Outputs.PrivateSubnet3AID'
vsnyc commented 2 years ago

@git4rk - you can use the NumberOfAZs to decide this. If the user chooses this number as 2, then you use only 2 AZs for deployment (ignoring how many AZs they've actually selected). See an example here:

Condition: https://github.com/aws-quickstart/quickstart-mongodb/blob/main/templates/mongodb-new-vpc.template.yaml#L351-L353 Usage: https://github.com/aws-quickstart/quickstart-mongodb/blob/main/templates/mongodb-new-vpc.template.yaml#L440-L443

Then in bootnode template, you may have to create two config sets, one for 2AZ and one for 3AZ, and then choose the right config set in the user data. e.g. something like:

Config set changes:

        configSets:
          Required2AZ:
            - StackProperties2AZFile
          Required3AZ:
            - StackProperties3AZFile 

User data changes:

      UserData:
        Fn::Base64:
          !Sub 
          -  |
              #!/bin/bash
              /opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource BootNode --configsets ${ConfigSetName} --region ${AWS::Region}
             /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource BootNode --region ${AWS::Region}
          -  ConfigSetName: !If
            - 3AZCondition
            - Required3AZ
            - Required2AZ
vsnyc commented 2 years ago

/do-e2e-tests

vsnyc commented 2 years ago

/do-e2e-tests

git4rk commented 2 years ago

@vsnyc Earlier, we were using eksctl default options which created the env with two AZs. We are thinking of doing the same:

Please let us know if you are ok with this approach. Thanks.

vsnyc commented 2 years ago

@git4rk - yes, supporting a 2-AZ architecture is acceptable.

git4rk commented 2 years ago

@vsnyc I have accepted your changes. Please let me know if anything else is needed before you can merge it.

vsnyc commented 2 years ago

@git4rk - I am merging it to develop and then will create a new PR to merge to main. The only requirement for merging in main is that the tests pass - i.e. both creation and deletion is successful. We may have to work on an exception strategy, I'll keep you posted.