[AKS] kube-scheduler static POD not running for Aliyun GPU Scheduler Extender

Scenario I am trying to use Aliyun scheduler extender to be able to use a T4 nVidia GPU with multiple PODs, I have a managed AKS cluster with a default NodePool with standard VMs (Standard_D2_v3) and added an User NodePool with Standard_NC4as_T4_v3 instances, all running Ubuntu 22.04.4, I enabled default nVidia driver installation and have the driver and nvidia-smi running:

I am following instructions given here for the Aliyun installation, I have already activated the SSH access to the GPU nodes, placed the scheduler-policy-config.yaml file into /etc/kubernetes and the kube-scheduler.yaml file into the /etc/kubernetes/manifests folder.

My cluster runs K8S 1.28:

Problem My problem is that when I put the kube-scheduler.yaml file into the /etc/kubernetes/manifests folder the PODs does not run and I get Auth failure logs of the POD that remains in CrashLoopBackoff status:

I tried setting the KUBERNETES_MASTER env variable to the Cluster's DNS including the port but no luck, I see that those variables get injected when the POD runs.

I've noticed that the /etc/kubernetes/scheduler.conf file, used to run the command in this file, is empty, I tried to generate certs to get a valid scheduler configuration file, using the token of a ServiceAccount and using the Kubeconfig of the Kubelet but I've failed.

Wanted to ask if someone has managed to sucessfully install Aliyun on a managed AKS Cluster with User NodePools.

Thanks in advance!

AliyunContainerService / gpushare-scheduler-extender

[AKS] kube-scheduler static POD not running for Aliyun GPU Scheduler Extender #224