aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.66k stars 3.92k forks source link

(aws-iam): (Simplify OpenIdConnectProvider by using CloudFormation resource instead of custom resource lambda) #21197

Open Gtofig opened 2 years ago

Gtofig commented 2 years ago

Describe the feature

OpenIdConnectProvider construct currently creates custom resource lambda and associated resources to create OIDC provider.

However, CloudFormation now supports it out of the box: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-oidcprovider.html

The construct can be significantly simplified by moving to use direct CloudFormation resource.

Use Case

Custom resource lambdas are more complex, harder to understand, and reduce visibility into what's being created. Using CloudFormation resource directly would reduce complexity.

Proposed Solution

Switch to AWS::IAM::OIDCProvider CloudFormation resource

Other Information

No response

Acknowledgements

CDK version used

2.32.1

Environment details (OS name and version, etc.)

Ubuntu 18

kadrach commented 2 years ago

Related: https://github.com/aws/aws-cdk/issues/20460 and https://github.com/aws/aws-cdk/issues/8607

PeterBaker0 commented 2 years ago

We received a notification that NodeJS 12 runtime is no longer being supported. This custom Lambda resource uses this runtime so could become an issue in coming months.

"AWS Lambda end of support for Node.js 12"

Is this relevant in prioritising this issue?

Cheers.

diranged commented 2 years ago

So I came across this issue after learning that the AWS-CDK's OpenIdConnectProvider construct is actually a custom construct and not leveraging the CfnOIDCProvider resource. I was surprised to find that this construct had a pretty old bug that we ran into years ago on our own custom code to do the same thing.

I have a solution here that has worked for us for a while now, and I've just ported it into a CDK-based deploy. I'd be happy to see this code baked into the CDK:

assets/thumbprint-layer/Dockerfile

FROM public.ecr.aws/sam/build-python3.7
COPY requirements.txt ./
RUN python -m pip install -r requirements.txt -t /opt/python
WORKDIR /
ENTRYPOINT [ "/bin/bash" ]

assets/thumbprint-layer/requirements.txt

boto3
pyOpenSSL
awsretry
certifi
cfn_resource_provider

assets/thumbprint-layer/thumprint.py

import certifi, sys, json, logging, os, urllib, ssl, socket, requests, traceback
from OpenSSL import crypto, SSL
from uuid import uuid4
from cfn_resource_provider import ResourceProvider

logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(os.getenv('LOG_LEVEL', 'DEBUG'))

request_schema = {
  '$schema': 'http://json-schema.org/draft-04/schema#',
  'type': 'object',
  'required': ['URL'],
  'properties': {
    'URL': {'type': 'string', 'Description': 'The URL to hit'},
  }
}

class Thumbprint(ResourceProvider):
  def __init__(self):
    super(Thumbprint, self).__init__()
    self.request_schema = request_schema

  @property
  def url(self):
    return self.get('URL')

  @property
  def openid_config(self):
    return requests.get(self.url + '/.well-known/openid-configuration').json()

  @property
  def jwks_tuple(self):
    url = urllib.parse.urlparse(self.openid_config['jwks_uri'])
    return (url.hostname, 443)

  @property
  def root_cert(self):
    host, port = self.jwks_tuple
    context = SSL.Context(method=SSL.TLSv1_2_METHOD)
    context.load_verify_locations(cafile=certifi.where())
    conn = SSL.Connection(context, socket=socket.socket(socket.AF_INET, socket.SOCK_STREAM))
    conn.settimeout(5)
    conn.connect((host, port))
    conn.setblocking(1)
    conn.do_handshake()
    conn.set_tlsext_host_name(host.encode())
    return conn.get_peer_cert_chain()[-1]

  @property
  def thumbprint(self):
    return self.root_cert.digest('sha1').decode('utf-8').replace(':','')

  def create(self):
    try:
      self.physical_resource_id = self.thumbprint
    except Exception as e:
      traceback.print_exc()
      self.physical_resource_id = 'could-not-execute'
      self.fail('Failed to execute: %s' % e)

  def update(self):
    # Updates are not possible. Must delete and re-create. :(
    self.delete()
    self.create()

  def delete(self):
    self.success('no-op')

# This is the lambda handler call... but when we operate locally, see the
# __main__ below.
def handler(request, context):
  return Thumbprint().handle(request, context)

# When testing locally we just gather some data, but make no changes. The point
# of this test (right now) is to verify that we can hit the SSL endpoint, get
# the root certificate information and return it.
if __name__ == '__main__':
  print('Beginning test...  validating command-line usage first...')
  url = sys.argv[1]
  if not url:
      print(f'Usage: {sys.argv[0]} <EKS Cluster OIDC URL>')
      sys.exit(1)

  print(f'Will run test against {url} cluster...')
  request = {
    "RequestType": 'CREATE',
    "ResponseURL": "https://httpbin.org/put",
    "StackId": "arn:aws:cloudformation:us-west-2:EXAMPLE/stack-name/guid",
    "RequestId": "request-%s" % uuid4(),
    "ResourceType": "Custom::Resource",
    "LogicalResourceId": "MyCustomResource",
    "ResourceProperties": {
        "URL": url
    }
  }

  print('Creating Thumbprint() object...')
  thumbprint = Thumbprint()
  thumbprint.set_request(request, {})
  print(f'TEST: root_cert thumbprint = {thumbprint.thumbprint}')

AWS-CDK Resources

    const codeDirectory = Code.fromAsset('assets/thumbprint-layer');
    const thumbprintLayer: LayerVersion = new LayerVersion(this, 'ThumbprintLayer', {
      removalPolicy: RemovalPolicy.DESTROY,
      layerVersionName: scope.stackName,
      code: Code.fromDockerBuild(codeDirectory.path, {
        file: 'Dockerfile',
        platform: Architecture.X86_64.dockerPlatform,
        imagePath: '/opt'
      }),
      compatibleArchitectures: [Architecture.X86_64]
    });
    const handler = new Function(this, 'ThumbprintHandler', {
      runtime: Runtime.PYTHON_3_7,
      code: codeDirectory,
      layers: [thumbprintLayer],
      handler: 'thumbprint.handler'
    });
    handler.node.addDependency(thumbprintLayer);
    const thumbprint = new CustomResource(this, 'Thumbprint', {
      removalPolicy: RemovalPolicy.RETAIN,
      resourceType: 'Custom::Thumbprint',
      serviceToken: handler.functionArn,
      properties: {
        URL: props.cluster.clusterOpenIdConnectIssuerUrl
      }
    });
    thumbprint.node.addDependency(handler);
    new CfnOIDCProvider(this, 'OidcProvider', {
      thumbprintList: [thumbprint.ref],
      clientIdList: ['sts.amazonaws.com'],
      url: props.cluster.clusterOpenIdConnectIssuerUrl
    });
ramesh82 commented 1 year ago

So I came across this issue after learning that the AWS-CDK's OpenIdConnectProvider construct is actually a custom construct and not leveraging the CfnOIDCProvider resource. I was surprised to find that this construct had a pretty old bug that we ran into years ago on our own custom code to do the same thing.

I have a solution here that has worked for us for a while now, and I've just ported it into a CDK-based deploy. I'd be happy to see this code baked into the CDK:

assets/thumbprint-layer/Dockerfile

FROM public.ecr.aws/sam/build-python3.7
COPY requirements.txt ./
RUN python -m pip install -r requirements.txt -t /opt/python
WORKDIR /
ENTRYPOINT [ "/bin/bash" ]

assets/thumbprint-layer/requirements.txt

boto3
pyOpenSSL
awsretry
certifi
cfn_resource_provider

assets/thumbprint-layer/thumprint.py

import certifi, sys, json, logging, os, urllib, ssl, socket, requests, traceback
from OpenSSL import crypto, SSL
from uuid import uuid4
from cfn_resource_provider import ResourceProvider

logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(os.getenv('LOG_LEVEL', 'DEBUG'))

request_schema = {
  '$schema': 'http://json-schema.org/draft-04/schema#',
  'type': 'object',
  'required': ['URL'],
  'properties': {
    'URL': {'type': 'string', 'Description': 'The URL to hit'},
  }
}

class Thumbprint(ResourceProvider):
  def __init__(self):
    super(Thumbprint, self).__init__()
    self.request_schema = request_schema

  @property
  def url(self):
    return self.get('URL')

  @property
  def openid_config(self):
    return requests.get(self.url + '/.well-known/openid-configuration').json()

  @property
  def jwks_tuple(self):
    url = urllib.parse.urlparse(self.openid_config['jwks_uri'])
    return (url.hostname, 443)

  @property
  def root_cert(self):
    host, port = self.jwks_tuple
    context = SSL.Context(method=SSL.TLSv1_2_METHOD)
    context.load_verify_locations(cafile=certifi.where())
    conn = SSL.Connection(context, socket=socket.socket(socket.AF_INET, socket.SOCK_STREAM))
    conn.settimeout(5)
    conn.connect((host, port))
    conn.setblocking(1)
    conn.do_handshake()
    conn.set_tlsext_host_name(host.encode())
    return conn.get_peer_cert_chain()[-1]

  @property
  def thumbprint(self):
    return self.root_cert.digest('sha1').decode('utf-8').replace(':','')

  def create(self):
    try:
      self.physical_resource_id = self.thumbprint
    except Exception as e:
      traceback.print_exc()
      self.physical_resource_id = 'could-not-execute'
      self.fail('Failed to execute: %s' % e)

  def update(self):
    # Updates are not possible. Must delete and re-create. :(
    self.delete()
    self.create()

  def delete(self):
    self.success('no-op')

# This is the lambda handler call... but when we operate locally, see the
# __main__ below.
def handler(request, context):
  return Thumbprint().handle(request, context)

# When testing locally we just gather some data, but make no changes. The point
# of this test (right now) is to verify that we can hit the SSL endpoint, get
# the root certificate information and return it.
if __name__ == '__main__':
  print('Beginning test...  validating command-line usage first...')
  url = sys.argv[1]
  if not url:
      print(f'Usage: {sys.argv[0]} <EKS Cluster OIDC URL>')
      sys.exit(1)

  print(f'Will run test against {url} cluster...')
  request = {
    "RequestType": 'CREATE',
    "ResponseURL": "https://httpbin.org/put",
    "StackId": "arn:aws:cloudformation:us-west-2:EXAMPLE/stack-name/guid",
    "RequestId": "request-%s" % uuid4(),
    "ResourceType": "Custom::Resource",
    "LogicalResourceId": "MyCustomResource",
    "ResourceProperties": {
        "URL": url
    }
  }

  print('Creating Thumbprint() object...')
  thumbprint = Thumbprint()
  thumbprint.set_request(request, {})
  print(f'TEST: root_cert thumbprint = {thumbprint.thumbprint}')

AWS-CDK Resources

    const codeDirectory = Code.fromAsset('assets/thumbprint-layer');
    const thumbprintLayer: LayerVersion = new LayerVersion(this, 'ThumbprintLayer', {
      removalPolicy: RemovalPolicy.DESTROY,
      layerVersionName: scope.stackName,
      code: Code.fromDockerBuild(codeDirectory.path, {
        file: 'Dockerfile',
        platform: Architecture.X86_64.dockerPlatform,
        imagePath: '/opt'
      }),
      compatibleArchitectures: [Architecture.X86_64]
    });
    const handler = new Function(this, 'ThumbprintHandler', {
      runtime: Runtime.PYTHON_3_7,
      code: codeDirectory,
      layers: [thumbprintLayer],
      handler: 'thumbprint.handler'
    });
    handler.node.addDependency(thumbprintLayer);
    const thumbprint = new CustomResource(this, 'Thumbprint', {
      removalPolicy: RemovalPolicy.RETAIN,
      resourceType: 'Custom::Thumbprint',
      serviceToken: handler.functionArn,
      properties: {
        URL: props.cluster.clusterOpenIdConnectIssuerUrl
      }
    });
    thumbprint.node.addDependency(handler);
    new CfnOIDCProvider(this, 'OidcProvider', {
      thumbprintList: [thumbprint.ref],
      clientIdList: ['sts.amazonaws.com'],
      url: props.cluster.clusterOpenIdConnectIssuerUrl
    });

How do you import an existing provider arn using this approach?

douglasnaphas commented 1 year ago

If anyone is aware of past situations where a construct moved from being supported via a custom resource, to being supported via a CloudFormation resource, these would be helpful in working on this change.

I am wondering, could the change be made so that apps already using this construct could upgrade without getting their provider resource deleted? Is it a requirement that existing providers not be deleted and re-created? If they are going to be deleted, does this change need to go in an alpha package?

Update: It seems from https://github.com/aws/aws-cdk/pull/22573 and https://github.com/aws/aws-cdk/pull/16014 that the upgrade could be handled with a feature flag.

douglasnaphas commented 1 year ago

@diranged You said the custom resource contains a pretty old bug that you encountered previously. I see that you have posted code that contains a solution, but what was the bug?

douglasnaphas commented 1 year ago

The Custom Resource handler code implements the logic that lets you omit the thumbprints when instantiating your Provider.

The handler code downloads the thumbprint and does some validation on it when a thumbprint is not provided.

I'm not sure how to retain this feature, meaning the ability to omit the thumbprints and have the construct get a thumbprint at deploy time, without leaving the Provider in a Custom Resource.

If we add a feature flag that causes the OIDC Provider to be created via the Cfn resource, for example, IAM_OIDC_PROVIDER_CFN, and someone enables that feature flag, then I think the thumbprint prop would no longer be optional, because we would have no way to fetch the thumbprint. The thumbprintList is a required prop for the Cfn resource.

diranged commented 1 year ago

@douglasnaphas I believe the bug we ran into was that the custom CFN provider was picking the wrong CA in the CA bundle to generate the thumbprint from. Instead of picking the root, it picks a SubCA. AWS then changed the SubCA cert on us a year or two ago, and our authentication broke at that point.

The code I have here I think is “better” because we’re only using a custom provider to get the thumbprint ID - and nothing else. The rest is native CFN, which means that as the CfnOIDCProvider adds more features (and maybe some day adds the thumbprint picking), we can leverage it and eventually retire the custom CFN code.

douglasnaphas commented 1 year ago

@douglasnaphas I believe the bug we ran into was that the custom CFN provider was picking the wrong CA in the CA bundle to generate the thumbprint from. Instead of picking the root, it picks a SubCA. AWS then changed the SubCA cert on us a year or two ago, and our authentication broke at that point.

The code I have here I think is “better” because we’re only using a custom provider to get the thumbprint ID - and nothing else. The rest is native CFN, which means that as the CfnOIDCProvider adds more features (and maybe some day adds the thumbprint picking), we can leverage it and eventually retire the custom CFN code.

I think it would make sense to break the thumbprint logic out into its own Thumbprint Construct. I have proposed this in the CDK RFC repo.

I don't think the OIDC Provider L1 Cfn resource will ever support thumbprint picking, since the API for OIDC Providers requires the thumbprint to be specified, and L1 Constructs don't have any additional logic. I think it would be better for the thumbprint picking logic to live in the proposed Thumbprint Construct, and for the OIDC Provider L2 Construct to accept a Thumbprint Construct.

@diranged Did https://github.com/aws/aws-cdk/pull/22802 fix the wrong-certificate bug?

dreamorosi commented 1 year ago

Since about a month ago pinning certificate thumbprints with GitHub as OIDC provider is no longer required.

See GitHub's announcement here.

This is what shows up on AWS's side:

image
GentileFulvio commented 1 year ago

Any updates on this ? We had to switch to regular cloudformation templates because of this issue

AlessandroVol23 commented 10 months ago

We received a notification that NodeJS 12 runtime is no longer being supported. This custom Lambda resource uses this runtime so could become an issue in coming months.

"AWS Lambda end of support for Node.js 12"

Is this relevant in prioritising this issue?

Cheers.

I'm facing a similar issue. The custom resource Lambda is on node 16 and I'd like to update to node 18 but I'm not sure how to do that since the Lambda is provisioned automatically. Any idea?

github-actions[bot] commented 10 months ago

This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.

WarFox commented 9 months ago

I have PR ready for review here https://github.com/aws/aws-cdk/pull/28634

It introduces a new OpenIdConnecProvider that uses the CloudFormation resource so that there is no breaking change for the existing construct that uses custom resources.

Looking for suggestions and comments on this. It is my first contribution to aws-cdk. Thank you 🙏