elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.15k stars 4.91k forks source link

AWS billing module failing with `period` lower than 24h #32627

Closed aspacca closed 7 months ago

aspacca commented 2 years ago

See: https://discuss.elastic.co/t/fleet-aws-billing-integration/311138/6

it's still present in with the aws sdk v2 upgrade this the error returned by GetCostAndUsageRequest call: Start date (and hour) should be before end date (and hour)

when we generate the start and end time we remove from now the period setting duration, and finally truncate the time part. anything less than 24h will result in the same date (https://github.com/elastic/beats/blob/7a469fd6096ac1a4adfd45682c381daa2e0eb139/x-pack/metricbeat/module/aws/billing/billing.go#L375-L381, https://github.com/elastic/beats/blob/7a469fd6096ac1a4adfd45682c381daa2e0eb139/x-pack/metricbeat/module/aws/billing/billing.go#L46)

period can accept any valid time.Duration

Tasks:

tommyers-elastic commented 2 years ago

couple of questions before we can properly solve this:

IMO 5m seems like a good answer to 1. for 2, this looks like it should be where this is documented, but i can't find any mention of it in the top-level AWS module, or billing data stream docs.

cc @ravikesarwani

aspacca commented 2 years ago
  1. What should our minimum period be? And does this need to be consistent with other AWS data streams?

The is usually a minimum interval that AWS metrics API calls return data for: this would be the "technical" minimum period we cannot go lower than. But still is up to @ravikesarwani in my opinion decide what's meaningful from the user point of view

  1. Where is the period documented? whatever granularity/minimum value we decide on should be clear to the customer.

configuration params to my knowledge are document on beats docs, and not integration docs period is not documented in the specific metricset (https://www.elastic.co/guide/en/beats/metricbeat/8.3/metricbeat-metricset-aws-billing.html) but it is mentioned in the module (https://www.elastic.co/guide/en/beats/metricbeat/8.3/metricbeat-module-aws.html#_billing)

  1. Is there a way to surface the error earlier if a smaller value than the minimum we support is used (instead of the error bubbling up from AWS SDK)?

that would be caught at validation time of the config, that happens before indeed starting collecting metrics, and will produce metricbeat to exits with an error

tommyers-elastic commented 2 years ago

thanks @aspacca.

configuration params to my knowledge are document on beats docs, and not integration docs period is not documented in the specific metricset (https://www.elastic.co/guide/en/beats/metricbeat/8.3/metricbeat-metricset-aws-billing.html) but it is mentioned in the module (https://www.elastic.co/guide/en/beats/metricbeat/8.3/metricbeat-module-aws.html#_billing)

i see period in the sample configs on these pages but not actually documented. i will update the issue description to ensure that we address it.

tommyers-elastic commented 2 years ago

I have done a little more reading of the API docs.

The period we can specify to the cost explorer API endpoint is a date range (i.e. granularity = 24hr). The result can have more granularity - up to a maximum of 1hr. See here https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/costexplorer#GetCostAndUsageInput

So in theory we could set a minimum period of 1 hour, but the implementation would require a little more work than simply fixing the getStartDateEndDate function.

I think for right now we should simply validate a minimum allowed configuration value of 24hr, and write some defensive code to ensure we never get the 'start date cannot equal end date' error from the SDK. Updating the issue description to reflect this.

kaiyan-sheng commented 2 years ago

Thanks @aspacca and @tommyers-elastic ! +1 on validating period configuration to at least 24h for now.

The AWS billing metricset collects metrics from cost explorer API and CloudWatch. For metrics like EstimatedCharges from CloudWatch, it's reported at least once a day, most days several times a day. But since some days it only reports once a day, we put 24h as the default period for billing to save cost on API calls.

Screen Shot 2022-08-08 at 10 36 23 AM

note: Documentation on billing metricset is wrong though with 12h default period.

ravikesarwani commented 2 years ago

@tommyers-elastic @aspacca @kaiyan-sheng My take would be to target providing the default “period” value based on interval it’s available in CloudWatch so that we can reduce latency and provide more real time information for infrastructure monitoring use cases.

So basically the default should be tailored for each data stream and is dependent on the minimum interval when the data is available in CloudWatch.

Example: For EC2 by default metrics (basic) are available in 5 minute interval so it makes sense for us to set the default period value as “5 minutes”. Detailed monitoring metrics is available at 1 minute interval so it makes sense to allow the users to set the value to a minimum of 1 minute interval. Allowing anything lower than 1 minute is not something that a user should be allowed to set for EC2 as it will not help them.

botelastic[bot] commented 1 year ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!