aws / aws-k8s-tester

AWS Kubernetes tester, kubetest2 deployer implementation
Apache License 2.0
163 stars 82 forks source link

Add debug logging for e2e-nvidia setup #477

Closed cartermckinnon closed 1 month ago

cartermckinnon commented 1 month ago

Description of changes:

Timeouts during Setup can be caused by a failure in one of:

  1. mpi-operator
  2. nvidia-device-plugin
  3. efa-device-plugin

This adds some logging to narrow things down.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.