Closed misohu closed 2 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6037.
This message was autogenerated
The setup of the Intel GPU plugin follows this documentation from the intel-dss-device-plugins-for-kubernetes
repo. When using the snap package of kubectl
, the commands are executed successfully. In our case, we want to replace the kubectl
commands with microk8s.kubectl
. However, there is a known issue with the kustomize
subcommand of microk8s.kubectl
. We get the following error message:
sudo microk8s.kubectl kustomize https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd?ref=${VERSION} > node_feature_discovery.yaml
error: failed to run '/snap/microk8s/7039/usr/bin/git fetch --depth=1 https://github.com/intel/intel-device-plugins-for-kubernetes v0.30.0': fatal: couldn't find remote ref v0.30.0
: exit status 128
As this comment suggests, the issue seems to be the version of git
that microk8s
uses (2.25.1), since the kustomize
subcommand internally calls git fetch
.
An idea I tried for solving this is by cloning the intel-dss-device-plugins-for-kubernetes
repo, checking out to the correct tag (v0.30.0
in our case), and then running microk8s.kubectl kustomize
on the local copy of the repo. However, I do receive a similar error:
VERSION=v0.30.0
git clone https://github.com/intel/intel-device-plugins-for-kubernetes.git --branch ${VERSION} --single-branch
sudo microk8s.kubectl kustomize intel-device-plugins-for-kubernetes/deployments/nfd > node_feature_discovery.yaml
error: accumulating resources: accumulation err='accumulating resources from 'base': '/home/ubuntu/intel-device-plugins-for-kubernetes/deployments/nfd/base' must resolve to a file': recursed accumulation of path '/home/ubuntu/intel-device-plugins-for-kubernetes/deployments/nfd/base': accumulating resources: accumulation err='accumulating resources from 'https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.15.4': URL is a git repository': failed to run '/snap/microk8s/7039/usr/bin/git fetch --depth=1 https://github.com/kubernetes-sigs/node-feature-discovery v0.15.4': fatal: couldn't find remote ref v0.15.4
: exit status 128
This is because the base
directory has this line that also specifies a remote URL, so git-fetch is once again called, and the command fails in a similar fashion.
I propose the following 3 solutions:
kubectl
snap package, and run all commands with kubectl
, in the same way as the upstream documentation indicates.git
to the snap package. Users would have to run the following command (note that the 4 digit number is not the same for all users):
sudo mount --bind -o nodev,ro /usr/bin /snap/microk8s/5250/usr/bin
intel-dss-device-plugins-for-kubernetes
repo, then clone the node-feature-discovery
repo from kubernetes-sigs
, and modify the kustomization.yaml
file in intel-dss-device-plugins-for-kubernetes/deployments/nfd/base/
to use the locally cloned directory. Note that we also have to clone the correct branches. After discussion with the team, we are proceeding with Solution A: The doc will include the installation of the kubectl
snap, and all commands will use kubectl
instead of microk8s.kubectl
.
Why it needs to get done
Inter GPU operator is a prerequisite for running Intel workloads on DSS. In this spec we need to describe how the end user should install the operator before using DSS. DSS is not installing this to user cluster.
We can use the setup described in this spec . Procedure is to deploy the device plugin manifests which we now keep in the dss repo here . There is a microk8s problem when deploying manifests from the URL. When fixed we can deploy directly form upstream.
What needs to get done
When is the task considered done