[Meta][AWS] Create AWS Bedrock Metrics Dataset

agithomas commented 5 days ago

Goal

Create AWS Bedrock dataset to capture metrics.

Existing System Behaviour

AWS Bedrock logs are already part of AWS Bedrock integration package. Pull Request

Requirement

[ ] Create a new dataset to fetch and store cloudwatch metrics
[ ] Create an AWS Bedrock overview dashboard to display key metrics and logs data

Reference:

Metrics Reference

agithomas commented 5 days ago

@tommyers-elastic / @lalit-satapathy / @ishleenk17

Kindly refer to the comment on the naming convention of the fields already followed for AWS Bedrock invocation dataset.

Qn: Should the same naming convention be followed for the metrics dataset as well?

Semantic convention for GenAI related metrics Reference

If yes, below mentioned are the list of field names I propose to use for storing metric values. Kindly review.

AWS Field name	Fields Description	Proposed Field Name
Invocations	Number of requests to the Converse, ConverseStream, InvokeModel, and InvokeModelWithResponseStream API operations.	`gen_ai.server.request.count`. Similar to gen_ai.server.request.*
InvocationLatency	Latency of the invocations.	`gen_ai.performance.invocation_latency` (As part of invocation dataset, `gen_ai.performance.*` metrics are added)
InvocationClientErrors	Number of invocations that result in client-side errors.	`gen_ai.client.invocation.error_count` Name Reference Example
InvocationServerErrors	Number of invocations that result in AWS server-side errors.	`gen_ai.server.invocation.error_count`. Name Reference Example
InvocationThrottles	Number of invocations that the system throttled.	`gen_ai.performance.invocation.throttle_count`
InputTokenCount	Number of tokens of text input.	`gen_ai.input.text_token_count`. The `aws_bedrock.invocation.input.input_token_count` added as part of elastic's invocation dataset for aws bedrock seem to represent the same information
LegacyModelInvocations	Number of invocations using Legacy models.	`gen_ai.performance.legacymodel_invocation_count`. The other metrics added as part of clodwatch_log's invocation dataset include `gen_ai.performance.request_size`, `gen_ai.performance.request.size`, `gen_ai.performance.response_time`.
OutputTokenCount	Number of tokens of text output.	`gen_ai.output.text_token_count`. Similar to InputTokenCount, there exists a field `aws_bedrock.invocation.output.output_token_count` added as part invocation dataset, capturing the token count
OutputImageCount	Number of output images.	`gen_ai.output.image_count`

ishleenk17 commented 5 days ago

@agithomas Do we already have the equivalent fields in OTEL for the ones we are proposing? Or are we saying we are going to follow almost similar naming conventions, considering those fields will get added in OTEL collector in future ? As a general rule of thumb, always prefer using semconv fields wherever possible.

lalit-satapathy commented 5 days ago

Qn: Should the same naming convention be followed for the metrics dataset as well?

Yes, We can use the _genai* field if it is available, otherwise use _awsbedrock.*

Adding @muthu-mps for any feedbacks.

agithomas commented 5 days ago

Yes, We can use the gen_ai field if it is available, otherwise use aws_bedrock.

Thanks @lalit-satapathy for your input. What confused me was thegen_ai.performance.* metrics, the definition not available under the semconv currently . After going through some discussions, I think, I now have a better understanding of these metrics.

As you mentioned, i would go with gen_ai.* fields, if it is available in semconv. If not, aws_bedrock.* prefix will be used.

muthu-mps commented 4 days ago

@agithomas - You can take a look into this issue for more information.

agithomas commented 3 days ago

After verifying the aggregation functions and statistics value used in Cloudwatch OOTB dashboard for AWS Bedrock using cloudwatch metrics, following would be fields and their respective aggregations, that will be used

Sum Aggregation

Invocations
InvocationClientErrors
InvocationServerErrors
LegacyModelInvocations
InputTokenCount
OutputTokenCount
OutputImageCount
InvocationThrottles

Avg Aggregation

InvocationLatency

agithomas commented 2 days ago

Regarding the dimensions, below is the extract from AWS Bedrock documentation

ModelId – all metrics

ModelId + ImageSize + BucketedStepSize – OutputImageCount

It appears there are some differences in the dimensions generated from what is documented. The BucketedStepSize is not something i could generate even when I tried with the Diffusion model such as Stability.ai, evaluating with both Image Generation and Variation.

AWS Cloudwatch dashboard view selection

Attached image for reference

elastic / integrations