Closed p-1603 closed 3 years ago
I can reproduce this issue and it's being caused by some conflicts between some changes that I made and some that Oracle made on their side. I think I know how to resolve it, and once it is, it will be simpler to use.
CitC now just build a single image type for all Oracle nodes. If you want to use the GPUs on a GPU nodes, then you'll need to build that into the image. There's information on this at https://cluster-in-the-cloud.readthedocs.io/en/latest/running.html#gpu-nodes
I have been trying to create GPU nodes for use on an Oracle CITC cluster. Following the documentation, I have been able to create node images. The packer configuration file seems to be
all.variables.pkr
rather thanconfig.json
, as stated in the docs, and I also have had to reformat theoracle-gpu
variables as follows:However, I have also noticed that the
/etc/citc/shapes.yaml
contains VM GPU shapes already. I used one of these to create several nodes, but have found that they either do not accept jobs, or get stuck at the 'configuring' stage. I was able successfully to submit jobs to one of these nodes after holding and releasing jobs with Slurm control, but have not since been able to replicate this with any others. I would appreciate any help in working out how to create working GPU nodes consistently.