elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
187 stars 390 forks source link

AWS Health Integration #8907

Open SubhrataK opened 5 months ago

SubhrataK commented 5 months ago

AWS Health assists in effectively managing ongoing events. It offers continuous insight into the performance of your resources and the availability of your AWS services and accounts. By leveraging AWS Health events, users obtain valuable insights into how service and resource modifications may impact their applications hosted on AWS.

The AWS Health integration with Elastic will retrieve the following information:

DescribeEvents operation - Summary information about events that are related to an AWS account. The events can be related to AWS operational issues, scheduled changes to AWS infrastructure, or security and billing notifications. DescribeEventDetails operation - Detailed information about one or more events, such as the AWS service, Region, Availability Zone, event start and end times, and a text description. DescribeAffectedEntities operation- Information about entities that are affected by one or more events. The results can be filtered by additional criteria, such as status, that might be assigned to AWS resources.

High Level Design

Criteria

Metric Fetch Mechanism

github.com/aws/aws-sdk-go-v2/service/health will be used to fetch the details of AWS Health

API Details

image

PR Link : https://github.com/elastic/beats/pull/38370

### Evaluation & Prototyping
- [ ] https://github.com/elastic/integrations/issues/9352
- [x] Determining the method of fetching DescribeEvents, DescribeEventDetails, DescribeAffectedEntities metrics and data
- [ ] https://github.com/elastic/beats/issues/38292
### Test Scripts Development (Metricbeat Module)
- [x] Integration Test Files & Scripts
- [x] Unit Test Scripts
### Metricbeat module development
- [x] Development of metricbeat module
- [x] Review & Release of metricbeat module
### AWS Health Integration package Development
- [ ] https://github.com/elastic/integrations/issues/10111
- [ ] Create documentation for the integration package
- [ ] Dashboard review with the Product team
- [ ] AWS Package Cataloguing
- [ ] https://github.com/elastic/obs-infraobs-team/issues/1382
### Release Process
- [ ] Integration testing
- [ ] Documentation review with Documentation team
- [ ] Review & Release of integration package
agithomas commented 3 months ago

Debug Logs Format :

{"log.level":"debug","@timestamp":"2024-03-19T08:05:06.712Z","log.logger":"aws.awshealth","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/metricbeat/module/aws/awshealth.(*MetricSet).getEventsSummary","file.name":"awshealth/awshealth.go","file.line":196},"message":"[AWS Health] [DescribeEventDetails] Event ARN : arn:aws:health:us-east-1::event/RDS/AWS_RDS_PLANNED_LIFECYCLE_EVENT/AWS_RDS_PLANNED_LIFECYCLE_EVENT_XXXXXXXXXXXXXX, Affected Entities (Pending) : 2, Affected Entities (Resolved): 0, Affected Entities (Others) : 0","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-03-19T08:05:06.730Z","log.logger":"aws.awshealth","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/metricbeat/module/aws/awshealth.(*MetricSet).getEventsSummary","file.name":"awshealth/awshealth.go","file.line":196},"message":"[AWS Health] [DescribeEventDetails] Event ARN : arn:aws:health:eu-west-3::event/RDS/AWS_RDS_PLANNED_LIFECYCLE_EVENT/AWS_RDS_PLANNED_LIFECYCLE_EVENT_YYYYYYYYYYYYY, Affected Entities (Pending) : 2, Affected Entities (Resolved): 0, Affected Entities (Others) : 0","service.name":"metricbeat","ecs.version":"1.6.0"}
agithomas commented 3 months ago

As part of the testing, it is noticed that even if there are multiple affected entitles, not all entities ARNs have a associated status.

In such cases, there won't be any summary information displayed and there won't be any associated status information displayed in the detailed view.

image

So, the count of ARNs must not be interpreted as sum of aws.awshealth.affected_entities_others, aws.awshealth.affected_entities_pending, aws.awshealth.affected_entities_resolved for a specific event ARN.

agithomas commented 3 months ago

When there is a status available against a resource ARN, AWS provides a view such as below, in contraction to the view mentioned here

image
agithomas commented 3 months ago

Not all events has an associated end time. In such cases, the end_time will be stored as "end_time": "0001-01-01T00:00:00.000Z",

agithomas commented 3 months ago

Dashboard layout

image
agithomas commented 3 months ago

I tried to export a dashboard using metricbeat using the command - ./metricbeat export dashboard using the 8.9 stack. The json , attached, got created under /_meta/kibana/8/dashboard path. However, if I run the mage check I get the error

Error: there are format errors in dashboards .

Also, i see a message , mentioned below

Cannot modify all index pattern references in dashboard - module/aws/awshealth/_meta/kibana/8/dashboard/494194b0-e9d3-11ee-9f73-dfef113e2924.json Please edit the dashboard override function named ReplaceIndexInDashboardObject in libbeat.

This error will be discussed with the ecosystem team to find if any mistake in command is made or if this is a known issue?

agithomas commented 3 months ago

Requested for the team review of the Metricbeat PR. Once merged, the Integration development of the AWS Health package will resume.

agithomas commented 2 months ago

Based on the feedback, the below changes are now attempted

  1. Make use of Paginators whenever possible
  2. Make use of aws pointer conversation APIs
  3. Avoid the usage of channels and pass EventARN as a batch. The exact batch size supported is to be determined.
  4. event.ID requirement
  5. Field name changes, especially related to count values
  6. Minor code refactoring
agithomas commented 2 months ago

Based on the feedback, the below changes are now attempted

  1. Make use of Paginators whenever possible Completed
  2. Make use of aws pointer conversation APIs Completed
  3. Avoid the usage of channels and pass EventARN as a batch. The exact batch size supported is to be determined. Completed. Above batch size 10, below mentioned error will be displayed
operation error Health: DescribeEventDetails, https response error StatusCode: 400, RequestID: 293e4861-1bdb-4b1b-9a41-2f81ab826f23, api error ValidationException: 1 validation error detected:
.......
at 'eventArns' failed to satisfy constraint: Member must have length less than or equal to 10
  1. event.ID requirement Addressed
  2. Field name changes, especially related to count values Addressed the comment. No corrective steps made
  3. Minor code refactoring Done
agithomas commented 2 months ago

Metricbeat changes are merged into the main branch.

agithomas commented 2 months ago

Assigning the project status to "Waiting". The remaining enhancement will commence following the availability of AWS Health metric beat as part of the elastic-agent.

agithomas commented 3 weeks ago

Resumed Integration Development.

System Test Result

image
agithomas commented 3 weeks ago
image