Add support to emit metric to the target Amazon Managed Service for Prometheus workspace
Beta
Issue #, if available:
Description of changes:
Add support to emit metric to the target Amazon Managed Service for Prometheus workspace
The test support emitting metric from cross account cross region
If amp url is not set, the test will not emitting metics
Emit NCCL test avg bus bandwith metric
Add metadata label to the metric
Add/update readme
Test
go test -timeout 60m -v . -args -nvidiaTestImage public.ecr.aws/o5d5x8n6/weicongw:nvidia --efaEnabled=true --feature=multi-node --ampMetricUrl=https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-9f8fe538-f707-46e7-863c-26bfb192dc52/api/v1/remote_write --ampMetricRoleArn=arn:aws:iam::665181186642:role/amp
...
[1,0]<stdout>:# Out of bounds values : 0 OK
[1,0]<stdout>:# Avg bus bandwidth : 3.68456
[1,0]<stdout>:#
[1,0]<stdout>:
mpi_test.go:145: Emitting nccl test metrics to AMP
Add support to emit metric to the target Amazon Managed Service for Prometheus workspace Beta
Issue #, if available:
Description of changes:
Test
Query the metric from AMP
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.