canonical / data-science-stack

Stack with machine learning tools needed for local development.
Apache License 2.0
18 stars 7 forks source link

Update `dss create` command to support Intel GPU notebooks. #147

Closed misohu closed 3 months ago

misohu commented 4 months ago

Why it needs to get done

In order to create a kubernetes pod with the intel gpu acceleration enabled we must:

After discussing with the team we decided to drop the --gpu intel argument from the dss create command. If the Intel acceleration is enabled (by user manually deploying the intel gpu operator) , all the notebooks will have the intel resources section filled automatically. Meaning that having correct image notebooks can use Intel hardware. This is not the problem for images without intel librarries as they will not use the resource anyways.

Because of this the dss create should check for the presence of intel gpu plugin. I f the plugin is there it will automatically populate the resources section.

Because we are using intel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter and intel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter images for intel ML notebooks we also need to adjust the command and args section (check the example). We can add these settings blobally to all dss notebook deployments as non intel ones are setting these in their Dockerfiles anyways (This I need to test).

We also need to add intel/intel-extension-for-tensorflow:2.15.0-xpu-idp-jupyter and intel/intel-extension-for-pytorch:2.1.20-xpu-idp-jupyter images are recommendations to dss create --help

You can find more information abou intel plugin here in this spec.

What needs to get done

When is the task considered done

Note to integration tests: at the time of writing the Intel team should be responsible for handling the integration tests as we cant access the machines with intel hardware in CI. Please refer to the outcome of this task for next steps.

syncronize-issues-to-jira[bot] commented 4 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6036.

This message was autogenerated