Open willprice opened 3 years ago
For my first go at getting a GPU image build I added the following to compute_image_extra.sh, just got this built but nvidia-smi is complaining about drivers.
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo sudo dnf clean all sudo dnf -y module install nvidia-driver:latest-dkms sudo dnf -y install cuda
As noted by @colinsauze, It is also necessary to increase the size of the image, this can be achieved by adding the following
launch_block_device_mappings {
device_name = "/dev/sda1"
volume_size = 40
}
to the end of the source "amazon-ebs" "aws"
section in /etc/citc/packer/all.pkr.hcl
It is also necessary to install kernel-devel
before install the nvidia drivers to ensure that the dkms module can be built, without that it will fail.
Docs are being updated at https://github.com/willprice/docs/blob/aws-nvidia-instructions/source/running.rst#aws-gpu-nodes
Once https://github.com/clusterinthecloud/docs/pull/17 is merged, this can be closed.
Currently launching instances with GPUs on AWS does not provision the VMs with the necessary drivers capable of interacting with the GPUs. It would be good to have some documentation for people who wish to use CitC in this manner. I plan on working on this today and will hopefully submit some PRs with instructions on this.