elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
187 stars 391 forks source link

[Meta][AWS] Create AWS Bedrock Metrics Dataset #10323

Open agithomas opened 5 days ago

agithomas commented 5 days ago

Goal

Create AWS Bedrock dataset to capture metrics.

Existing System Behaviour

AWS Bedrock logs are already part of AWS Bedrock integration package. Pull Request

Requirement

Reference:

agithomas commented 5 days ago

@tommyers-elastic / @lalit-satapathy / @ishleenk17

Kindly refer to the comment on the naming convention of the fields already followed for AWS Bedrock invocation dataset.

Qn: Should the same naming convention be followed for the metrics dataset as well?

Semantic convention for GenAI related metrics Reference

If yes, below mentioned are the list of field names I propose to use for storing metric values. Kindly review.

AWS Field name Fields Description Proposed Field Name
Invocations Number of requests to the Converse, ConverseStream, InvokeModel, and InvokeModelWithResponseStream API operations. gen_ai.server.request.count. Similar to gen_ai.server.request.*
InvocationLatency Latency of the invocations. gen_ai.performance.invocation_latency (As part of invocation dataset, gen_ai.performance.* metrics are added)
InvocationClientErrors Number of invocations that result in client-side errors. gen_ai.client.invocation.error_count Name Reference Example
InvocationServerErrors Number of invocations that result in AWS server-side errors. gen_ai.server.invocation.error_count. Name Reference Example
InvocationThrottles Number of invocations that the system throttled. gen_ai.performance.invocation.throttle_count
InputTokenCount Number of tokens of text input. gen_ai.input.text_token_count. The aws_bedrock.invocation.input.input_token_count added as part of elastic's invocation dataset for aws bedrock seem to represent the same information
LegacyModelInvocations Number of invocations using Legacy models. gen_ai.performance.legacymodel_invocation_count. The other metrics added as part of clodwatch_log's invocation dataset include gen_ai.performance.request_size, gen_ai.performance.request.size, gen_ai.performance.response_time.
OutputTokenCount Number of tokens of text output. gen_ai.output.text_token_count. Similar to InputTokenCount, there exists a field aws_bedrock.invocation.output.output_token_count added as part invocation dataset, capturing the token count
OutputImageCount Number of output images. gen_ai.output.image_count
ishleenk17 commented 5 days ago

@agithomas Do we already have the equivalent fields in OTEL for the ones we are proposing? Or are we saying we are going to follow almost similar naming conventions, considering those fields will get added in OTEL collector in future ? As a general rule of thumb, always prefer using semconv fields wherever possible.

lalit-satapathy commented 5 days ago

Qn: Should the same naming convention be followed for the metrics dataset as well?

Yes, We can use the _genai* field if it is available, otherwise use _awsbedrock.*

Adding @muthu-mps for any feedbacks.

agithomas commented 5 days ago

Yes, We can use the gen_ai field if it is available, otherwise use aws_bedrock.

Thanks @lalit-satapathy for your input. What confused me was thegen_ai.performance.* metrics, the definition not available under the semconv currently . After going through some discussions, I think, I now have a better understanding of these metrics.

As you mentioned, i would go with gen_ai.* fields, if it is available in semconv. If not, aws_bedrock.* prefix will be used.

muthu-mps commented 4 days ago

@agithomas - You can take a look into this issue for more information.

agithomas commented 3 days ago

After verifying the aggregation functions and statistics value used in Cloudwatch OOTB dashboard for AWS Bedrock using cloudwatch metrics, following would be fields and their respective aggregations, that will be used

Sum Aggregation

Avg Aggregation

agithomas commented 2 days ago

Regarding the dimensions, below is the extract from AWS Bedrock documentation

ModelId – all metrics

ModelId + ImageSize + BucketedStepSize – OutputImageCount

It appears there are some differences in the dimensions generated from what is documented. The BucketedStepSize is not something i could generate even when I tried with the Diffusion model such as Stability.ai, evaluating with both Image Generation and Variation.

AWS Cloudwatch dashboard view selection

image

Attached image for reference

image