In order to create a kubernetes pod with the intel gpu acceleration enabled we must:
Have the intel GPU operator enabled in Kubernetes cluster.
Have the Kubernetes resources section filled with the gpu.intel.com/i915 section. Example here.
After discussing with the team we decided to drop the --gpu intel argument from the dss create command. If the Intel acceleration is enabled (by user manually deploying the intel gpu operator) , all the notebooks will have the intel resources section filled automatically. Meaning that having correct image notebooks can use Intel hardware. This is not the problem for images without intel librarries as they will not use the resource anyways.
Because of this the dss create should check for the presence of intel gpu plugin. I f the plugin is there it will automatically populate the resources section.
Because we are using intel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter and intel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter images for intel ML notebooks we also need to adjust the command and args section (check the example). We can add these settings blobally to all dss notebook deployments as non intel ones are setting these in their Dockerfiles anyways (This I need to test).
We also need to add intel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter and intel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter images are recommendations to dss create --help
You can find more information abou intel plugin here in this spec.
What needs to get done
Add command and args section to every DSS notebook deployment to support intel nobooks without any changes in dss create.
Add resources section for intel devices if intel plugin is enabled
Align the design decisions of this change with the spec for Intel DSS
Align the design decisions of this change with discussion with the UX team
When is the task considered done
dss create can schedule intel gpu images as notebooks namely intel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter and intel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter.
Aforementioned images are part of dss create --help list of recommended notebooks.
Proper unit tests are written.
Note to integration tests: at the time of writing the Intel team should be responsible for handling the integration tests as we cant access the machines with intel hardware in CI. Please refer to the outcome of this task for next steps.
Why it needs to get done
In order to create a kubernetes pod with the intel gpu acceleration enabled we must:
gpu.intel.com/i915
section. Example here.After discussing with the team we decided to drop the
--gpu intel
argument from thedss create
command. If the Intel acceleration is enabled (by user manually deploying the intel gpu operator) , all the notebooks will have the intel resources section filled automatically. Meaning that having correct image notebooks can use Intel hardware. This is not the problem for images without intel librarries as they will not use the resource anyways.Because of this the
dss create
should check for the presence of intel gpu plugin. I f the plugin is there it will automatically populate the resources section.Because we are using
intel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter
andintel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter
images for intel ML notebooks we also need to adjust thecommand
andargs
section (check the example). We can add these settings blobally to all dss notebook deployments as non intel ones are setting these in their Dockerfiles anyways (This I need to test).We also need to add
intel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter
andintel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter
images are recommendations todss create --help
You can find more information abou intel plugin here in this spec.
What needs to get done
When is the task considered done
dss create
can schedule intel gpu images as notebooks namelyintel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter
andintel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter
.dss create --help
list of recommended notebooks.Note to integration tests: at the time of writing the Intel team should be responsible for handling the integration tests as we cant access the machines with intel hardware in CI. Please refer to the outcome of this task for next steps.