DataDog / datadog-cloudformation-resources

Apache License 2.0
51 stars 35 forks source link

Internal error with empty message when deploying #202

Open alexcouret opened 2 years ago

alexcouret commented 2 years ago

Describe the bug

I am currently trying to deploy monitors (Datadog::Monitors::Monitor) and am getting an Internal Error with an empty message (I've tried with a dashboard and get the same issue):

Resource handler returned message: "" (RequestToken: 46b52e64-453a-a516-2be4-547868e0bc6f, HandlerErrorCode: InternalFailure)

Here's the template I'm trying to deploy: ```yaml Resources: ThroughputMonitor: Type: Datadog::Monitors::Monitor Properties: Type: query alert Name: Service xxx has an abnormal change in throughput on env:prod Query: avg(last_4h):anomalies(sum:trace.express.request.hits{env:prod,service:xxx}, 'agile', 3, direction='both', interval=60, alert_window='last_15m', seasonality='weekly', timezone='utc', count_default_zero='true') >= 1 Message: "`xxx` throughput deviated too much from its usual value." Tags: - service:xxx - env:prod Priority: 2 Modified: "2022-03-29T13:15:48.646Z" Options: Thresholds: Critical: 1 CriticalRecovery: 0 NotifyAudit: false RequireFullWindow: false NotifyNoData: false RenotifyInterval: 0 ThresholdWindows: TriggerWindow: last_15m RecoveryWindow: last_15m IncludeTags: true EnableLogsSample: true Metadata: aws:cdk:path: DatadogMonitorsStack/ThroughputMonitor CDKMetadata: Type: AWS::CDK::Metadata Properties: Analytics: v2:deflate64:H4sIAAAAAAAA/zPUMzQx1jNQdEgsL9ZNTsnWT84vStWrDi5JTM7WcU7LC0otzi8tSk7Vcc7PKy4pKk0uAYkCOSmZJZn5ebU6efkpqXpZxfplhmZAo4AmZRVnZuoWleaVZOam6gVBaABHq3UgZgAAAA== Metadata: aws:cdk:path: DatadogMonitorsStack/CDKMetadata/Default Condition: CDKMetadataAvailable Conditions: CDKMetadataAvailable: Fn::Or: - Fn::Or: - Fn::Equals: - Ref: AWS::Region - af-south-1 - Fn::Equals: - Ref: AWS::Region - ap-east-1 - Fn::Equals: - Ref: AWS::Region - ap-northeast-1 - Fn::Equals: - Ref: AWS::Region - ap-northeast-2 - Fn::Equals: - Ref: AWS::Region - ap-south-1 - Fn::Equals: - Ref: AWS::Region - ap-southeast-1 - Fn::Equals: - Ref: AWS::Region - ap-southeast-2 - Fn::Equals: - Ref: AWS::Region - ca-central-1 - Fn::Equals: - Ref: AWS::Region - cn-north-1 - Fn::Equals: - Ref: AWS::Region - cn-northwest-1 - Fn::Or: - Fn::Equals: - Ref: AWS::Region - eu-central-1 - Fn::Equals: - Ref: AWS::Region - eu-north-1 - Fn::Equals: - Ref: AWS::Region - eu-south-1 - Fn::Equals: - Ref: AWS::Region - eu-west-1 - Fn::Equals: - Ref: AWS::Region - eu-west-2 - Fn::Equals: - Ref: AWS::Region - eu-west-3 - Fn::Equals: - Ref: AWS::Region - me-south-1 - Fn::Equals: - Ref: AWS::Region - sa-east-1 - Fn::Equals: - Ref: AWS::Region - us-east-1 - Fn::Equals: - Ref: AWS::Region - us-east-2 - Fn::Or: - Fn::Equals: - Ref: AWS::Region - us-west-1 - Fn::Equals: - Ref: AWS::Region - us-west-2 ```

To Reproduce Unfortunately, we're having this issue only with a specific AWS account. The same template deploys just fine on other accounts. The strange thing is, it is not different than any other account in the sense that all the necessary components and accesses are setup the same way. This is controlled by a single account and Control Tower and is applied to all the other accounts.

Logs Our problem is actually the lack of logs. The error is empty. We receive the correct errors if we make a syntax or semantic error, but for a valid template like the one linked, the error message is not provided.

Expected behavior A descriptive error message is provided to help pinpoint issues in the setup.

Environment and Versions (please complete the following information):

github-actions[bot] commented 2 years ago

Thanks for your contribution!

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community.

If you would like this issue to remain open:

  1. Verify that you can still reproduce the issue in the latest version of this project.

  2. Comment that the issue is still reproducible and include updated details requested in the issue template.

covertbert commented 2 years ago

@OzoTek did you figure this out? Think I'm experiencing the same issue.

alexcouret commented 2 years ago

@OzoTek did you figure this out? Think I'm experiencing the same issue.

@covertbert I did not! I just deployed the monitor from my dev account instead of prod, since the prod was the one with the issue. I still think errors should bubble up properly though.

skarimo commented 2 years ago

Hi it's hard to tell where the issue might be since the AWS logs are quite obfuscated. I would suggest double checking your TypeConfiguration since I have seen these arise due to malformed TypeConfiguration setup(whether its bad json or invalid secret resolver).

dannyburke1 commented 2 years ago

@OzoTek, @covertbert I got around this. It turns out, in our case, that the Datadog External ID secret already existed, the error is returned from CloudTrail and the resource doesn't know how to handle it, so throws the empty string. Can you check your CloudTrail logs, and any cloudformation events with an error code?