crossplane-contrib / provider-upjet-aws

Official AWS Provider for Crossplane by Upbound.
https://marketplace.upbound.io/providers/upbound/provider-aws
Apache License 2.0
137 stars 113 forks source link

Count external API calls #1241

Closed mergenci closed 3 months ago

mergenci commented 3 months ago

Description of your changes

This PR introduces three AWS API call counters:

  1. An AWS SDK v2 middleware to count authentication calls,
  2. An AWS SDK v2 middleware to count resource API calls,
  3. An AWS SDK v1 session handler to count resource API calls.

API calls that couldn't be completed because of a connection error are not counted. API calls that return API errors (or no errors) are counted. There are no comprehensive AWS documentation on request rate limits, but here are two resources on the topic, for reference:

  1. Request throttling for the Amazon EC2 API
  2. Managing and monitoring API throttling in your workloads

This PR also removes unsafe pointer operations, as described in https://github.com/upbound/terraform-provider-aws/pull/196.

Alternatives considered

During the design phase, we investigated whether implementing an http.RoundTripper would be a good solution. Ideally, we would have a common implementation for AWS SDK v1 and v2, since both methods use an http.Client under the hood. RoundTripper implementation proved to be infeasible, because of the following reasons:

  1. Plugging in a RoundTripper to the client returned by AWSClient.HTTPClient() worked for AWS SDK v1 calls, but not for AWS SDK v2 calls.
  2. AWS SDK v1 doesn't store service ID (EC2, IAM, etc.) and operation name (DescribeVPCs, etc.) in the request context, like AWS SDK v2 does. Therefore, we wouldn't be able to label v1 calls by service ID and operation name.

Checklist

I have:

I couldn't run make reviewable, because my local terraform setup is broken.

How has this code been tested

I've tested the code manually using the following resource configuration below, which contains resources that use AWS SDK v1 and v2, as of this writing. Because Upjet comes with Prometheus client, Upjet-based providers serve their metrics at :8080/metrics, by default. Here's a sample excerpt after applying the resource configuration:

# HELP upjet_resource_external_api_calls The number of external API calls.
# TYPE upjet_resource_external_api_calls counter
upjet_resource_external_api_calls{service="EC2",service_operation="AuthorizeSecurityGroupIngress"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="CreateSecurityGroup"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="CreateTags"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="CreateVpc"} 1
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeNetworkAcls"} 3
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeRouteTables"} 3
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeSecurityGroupRules"} 5
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeSecurityGroups"} 11
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeVpcAttribute"} 9
upjet_resource_external_api_calls{service="EC2",service_operation="DescribeVpcs"} 4
upjet_resource_external_api_calls{service="EC2",service_operation="RevokeSecurityGroupEgress"} 2
upjet_resource_external_api_calls{service="STS",service_operation="GetCallerIdentity"} 1

I manually cross-checked reported counts with the calls reported by CloudTrail Event History. Note that CloudTrail Event History may take up to a few minutes to show latest calls.

To test connection errors, I put breakpoints in the code, shut down my Internet connection upon hitting the breakpoint, and then resumed execution. To test API errors, I tried to delete a VPC that has a Security Group configured.

Resource Configuration

apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
  annotations:
    meta.upbound.io/example-id: ec2/v1beta1/securitygroupingressrule
  name: test-pr-1241-vpc
  labels:
    testing.upbound.io/example-name: test-pr-1241-vpc
spec:
  forProvider:
    region: us-west-1
    cidrBlock: 172.16.0.0/16
    tags:
      Name: TestPr1241VPC

---
apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroup
metadata:
  annotations:
    meta.upbound.io/example-id: ec2/v1beta1/securitygroupingressrule
  name: test-pr-1241-securitygroup
  labels:
    testing.upbound.io/example-name: test-pr-1241-securitygroup
spec:
  forProvider:
    region: us-west-1
    vpcIdSelector:
      matchLabels:
        testing.upbound.io/example-name: test-pr-1241-vpc

---
apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroupIngressRule
metadata:
  name: test-pr-1241-securitygroupingressrule
spec:
  forProvider:
    cidrIpv4: 10.0.0.0/8
    fromPort: 8080
    ipProtocol: tcp
    region: us-west-1
    securityGroupIdRef:
      name: test-pr-1241-securitygroup
    toPort: 8081
ulucinar commented 3 months ago

/test-examples="examples/iam/v1beta1/role.yaml"