awslabs / amazon-eks-ami

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
2.42k stars 1.14k forks source link

Add nvidia-bug-report to eks-logs-collector #1864

Closed suket22 closed 3 months ago

suket22 commented 3 months ago

Issue #, if available: N/A

Description of changes: This PR adds the execution of nvidia-bug-report.sh in the eks-logs-collector. This executable is part of the Nvidia drivers and is useful for debugging. Script is alsot mentioned in https://docs.nvidia.com/deploy/gpu-debug-guidelines/index.html

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

I tested this script on a g4dn instance which has an Nvidia GPU, and verified that the log.gz file created by nvidia-bug-report.sh is included in the log collector archive.

Trying to Collect CPU Throttled Process Information...
Trying to Collect IO Throttled Process Information...
Trying to Collect Nvidia Bug report...
Trying to archive gathered information...

    Done... your bundled logs are located in /var/log/eks_i-...tar.gz

Also ran the script against a t3.large to make sure the script doesn't break -

Trying to Collect CPU Throttled Process Information...
Trying to Collect IO Throttled Process Information...
Trying to Collect Nvidia Bug report... No Nvidia drivers found, nothing to do.

Trying to archive gathered information...

    Done... your bundled logs are located in /var/log/eks_i-....tar.gz

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.