aws-observability / cdk-aws-observability-accelerator

CDK AWS Observability Accelerator
https://aws-observability.github.io/cdk-aws-observability-accelerator/
MIT No Attribution
140 stars 36 forks source link

Neuron observability #121

Closed freschri closed 6 months ago

freschri commented 11 months ago

Single New EKS Cluster Open Source Observability Accelerator for Neuron-based clusters depends on https://github.com/aws-observability/aws-observability-accelerator/pull/32 @elamaran11 please check if we want to keep/remove cdk.json from gitignore (we need to be able to add entries that are validated in compilation otherwise github compilation fails on push)

elamaran11 commented 9 months ago

@freschri Any updates to this PR?

freschri commented 9 months ago

@freschri Any updates to this PR?

I addressed all points apart from the addons. One of them depends on another PR in Blueprints (Neuron Device Plugin Addon). My suggestion was on another channel to merge this PR as is, if no other concern, and raise 2 issues to port the addons to Blueprints. That can be done as an exercise by someone with limited experience on Blueprints/Patterns/Observability. Please let me know

elamaran11 commented 9 months ago

@freschri Any updates to this PR?

I addressed all points apart from the addons. One of them depends on another PR in Blueprints (Neuron Device Plugin Addon). My suggestion was on another channel to merge this PR as is, if no other concern, and raise 2 issues to port the addons to Blueprints. That can be done as an exercise by someone with limited experience on Blueprints/Patterns/Observability. Please let me know

Thats a good opportunity mentor, if you have some one in your region. If i have someone, i will let you know. But this is not a burning one, please take it slow.

ariveroi commented 6 months ago

New pr against inf1 branch with new quickstart version: https://github.com/aws-observability/cdk-aws-observability-accelerator/pull/153

If we merge #153 PR to inf1 branch we will be able to merge this PR

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-opensource-observability deploy

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-opensource-observability deploy

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-inferentia-opensource-observability deploy

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-opensource-observability deploy

freschri commented 6 months ago

@elamaran11 updated blueprints version by merging main into here. please trigger e2e thanks

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-inferentia-opensource-observability deploy

freschri commented 6 months ago

@elamaran11 e2e passes. shall we merge?

elamaran11 commented 6 months ago

@elamaran11 e2e passes. shall we merge?

Nope lets wait, I might to crash e2e and do few other checks.

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-inferentia-opensource-observability destroy

elamaran11 commented 6 months ago

The E2E Failure is related to subnet ENI which is very interim. It worked on retry,

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-opensource-observability deploy

elamaran11 commented 6 months ago

Running E2E on OSS Pattern.

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-opensource-observability destroy

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-inferentia-opensource-observability deploy

elamaran11 commented 6 months ago

/do-e2e-test single-new-eks-inferentia-opensource-observability destroy