Closed SurajAralihalli closed 1 year ago
@jayadeep-jayaraman I haven't been able to launch a cluster with A100 on Dataproc (due to limited/no availability) to test the MIG functionality of this script. Is there a CI/CD job that tests this?
cc: @viadea
Better to use L4 instances. A100 is very hard to get at the moment.
L4 GPUs don't support MIG (https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus).
If i remember correctly either @nvliyuan / @viadea had mentioned that MIG is not a common feature and also in the spark rapids documentation it is mentioned that MIG is not recommended. Therefore, can we remove this feature ?
@jayadeep-jayaraman Let's keep this feature for now.
/gcbrun
The tests have passed, merging this change
This PR updates the MIG script to use the latest driver installation method and also addresses the following issues
Supported Linux Distros:
Default Driver Version Update:
Improved CUDA Driver Installation:
Systemd Service for Kernel Headers on Debian and Ubuntu:
Signed-off-by: Suraj Aralihalli suraj.ara16@gmail.com