Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

[Feature] Support Confidential Computing on AKS with DC-series VMs #1608

Closed annerajb closed 3 years ago

annerajb commented 4 years ago

What happened: Cannot Create Nodepools or AKS cluster with Confidential Compute / SGX VM's What you expected to happen: Nodepool to be composed of Confidential Compute up to my Subscription Quota. How to reproduce it (as minimally and precisely as possible):

az aks create -g $RESOURCE_GROUP -n $CLUSTERNAME \
  --tags $TAGS_VALUES \
  --nodepool-tags $TAGS_VALUES \
  --ssh-key-value ~/.ssh/id_rsa.pub \
  --node-resource-group $NODE_RESOURCE_GROUP \
  --kubernetes-version 1.16.7 \
  `#VirtualMachineScaleSets is needed` \
  --vm-set-type VirtualMachineScaleSets

Then add nodepool

az aks nodepool add \
    --resource-group $RESOURCE_GROUP \
    --cluster-name $CLUSTERNAME \
    --tags $TAGS_VALUES \
    --name sgxpool \
    --node-count 1 \
    --node-vm-size Standard_DC4s \
    --kubernetes-version 1.16.7

Nodepool fails on CLI with super obscure error. (Resource group deployment shows extension deployment/install to VMSS instance failed. more specifically the SGX Extension failed to compile with a Kernel Build error.

Tried on both Ubuntu 16.04 and the new Preview of Ubuntu 18.04 and also Kubernetes 1.16.7 and 1.17

I could not find a way to create a nodepool with Ubuntu 18.04 Preview since the aks-custom-headers flag is new and only appears to exist on az aks create
Also a confidential compute instance cannot be used for the main node pool (which makes sense)

Anything else we need to know?: Output of error when the SGX Kernel Driver is ran by the VMSS Extension this was found on the /var/log/azure/cluster-provisioning.log which said to look at the /var/lib/dkms/sgx/1.12/build/make.log

azureuser@aks-sgxpool-18316057-vmss000000:~$ sudo /opt/azure/containers/oe/sgx_linux_x64_driver_1.12_c110012.bin
Unpacking Intel SGX Driver ... done.
Verifying the integrity of the install package ... done.
Installing Intel SGX Driver ...
/tmp/sgx-driver-RFD5eI ~
install -d /opt/intel/sgxdriver/package
install -d /opt/intel/sgxdriver/scripts
cp -r package/* /opt/intel/sgxdriver/package
install scripts/* /opt/intel/sgxdriver/scripts
~

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area....
'make' sign KDIR=/lib/modules/4.15.0-1077-azure/build....(bad exit status: 2)
ERROR (dkms apport): binary package for sgx: 1.12 not found
Error! Bad return status for module build on kernel: 4.15.0-1077-azure (x86_64)
Consult /var/lib/dkms/sgx/1.12/build/make.log for more information.
azureuser@aks-sgxpool-18316057-vmss000000:~$
azureuser@aks-sgxpool-18316057-vmss000000:~$ cat /var/lib/dkms/sgx/1.12/build/make.log
DKMS make.log for sgx-1.12 for kernel 4.15.0-1077-azure (x86_64)
Fri May 15 03:08:03 UTC 2020
make -C /lib/modules/4.15.0-1077-azure/build SUBDIRS=/var/lib/dkms/sgx/1.12/build CFLAGS_MODULE="-I/var/lib/dkms/sgx/1.12/build -I/var/lib/dkms/sgx/1.12/build/include" modules LE_ACTION=SIGN
make[1]: Entering directory '/usr/src/linux-headers-4.15.0-1077-azure'
  CC      /var/lib/dkms/sgx/1.12/build/le/main.o
In file included from ./arch/x86/include/asm/sgx.h:59:0,
                 from /var/lib/dkms/sgx/1.12/build/le/main.c:55:
./arch/x86/include/asm/sgx_arch.h:71:20: error: implicit declaration of function ‘BIT’ [-Werror=implicit-function-declaration]
  SGX_ATTR_INIT   = BIT(0),
                    ^
./arch/x86/include/asm/sgx_arch.h:71:2: error: enumerator value for ‘SGX_ATTR_INIT’ is not an integer constant
  SGX_ATTR_INIT   = BIT(0),
  ^
./arch/x86/include/asm/sgx_arch.h:72:2: error: enumerator value for ‘SGX_ATTR_DEBUG’
is not an integer constant
  SGX_ATTR_DEBUG   = BIT(1),
  ^
./arch/x86/include/asm/sgx_arch.h:73:2: error: enumerator value for ‘SGX_ATTR_MODE64BIT’ is not an integer constant
  SGX_ATTR_MODE64BIT  = BIT(2),
  ^
./arch/x86/include/asm/sgx_arch.h:74:2: error: enumerator value for ‘SGX_ATTR_PROVISIONKEY’ is not an integer constant
  SGX_ATTR_PROVISIONKEY = BIT(4),
  ^
./arch/x86/include/asm/sgx_arch.h:75:2: error: enumerator value for ‘SGX_ATTR_EINITTOKENKEY’ is not an integer constant
  SGX_ATTR_EINITTOKENKEY = BIT(5),
  ^
./arch/x86/include/asm/sgx_arch.h:77:1: error: enumerator value for ‘SGX_ATTR_KSS’ is not an integer constant
 };
 ^
In file included from /var/lib/dkms/sgx/1.12/build/le/main.c:61:0:
/var/lib/dkms/sgx/1.12/build/le/main.h:64:27: error: ‘struct sgx_launch_request’ declared inside parameter list [-Werror]
 void sgx_get_token(struct sgx_launch_request *req, void *entry);
                           ^
/var/lib/dkms/sgx/1.12/build/le/main.h:64:27: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
/var/lib/dkms/sgx/1.12/build/le/main.c:148:38: error: ‘struct sgx_le_output’ declared inside parameter list [-Werror]
 static int write_output(const struct sgx_le_output *output)
                                      ^
/var/lib/dkms/sgx/1.12/build/le/main.c: In function ‘write_output’:
/var/lib/dkms/sgx/1.12/build/le/main.c:154:25: error: dereferencing pointer to incomplete type ‘const struct sgx_le_output’
  for (i = 0; i < sizeof(*output); ) {
                         ^
/var/lib/dkms/sgx/1.12/build/le/main.c: In function ‘_start’:
/var/lib/dkms/sgx/1.12/build/le/main.c:167:28: error: storage size of ‘req’ isn’t known
  struct sgx_launch_request req;
                            ^
/var/lib/dkms/sgx/1.12/build/le/main.c:167:28: error: unused variable ‘req’ [-Werror=unused-variable]
cc1: all warnings being treated as errors
scripts/Makefile.build:330: recipe for target '/var/lib/dkms/sgx/1.12/build/le/main.o' failed
make[3]: *** [/var/lib/dkms/sgx/1.12/build/le/main.o] Error 1
/var/lib/dkms/sgx/1.12/build/Makefile:131: recipe for target '/var/lib/dkms/sgx/1.12/build/le/sgx_le_proxy' failed
make[2]: *** [/var/lib/dkms/sgx/1.12/build/le/sgx_le_proxy] Error 2
Makefile:1577: recipe for target '_module_/var/lib/dkms/sgx/1.12/build' failed
make[1]: *** [_module_/var/lib/dkms/sgx/1.12/build] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.15.0-1077-azure'
Makefile:96: recipe for target 'sign' failed
make: *** [sign] Error 2

Environment:

seanmck commented 4 years ago

Confidential SKUs are not supported yet, but this is in our backlog.

annerajb commented 4 years ago

Can the cloud init/ cluster-provisioning extension be fixed so that it does not break / halt the node creation process at least? Right now specifying any confidential compute SKU would make the nodepool / vmss fail with that error. (Took 3 days to figure out and azure support was clueless on the missing support on aks)

On Fri, May 15, 2020, 10:01 AM Sean McKenna notifications@github.com wrote:

Confidential SKUs are not supported yet, but this is in our backlog.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure/AKS/issues/1608#issuecomment-629253044, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGXQXULSSBIZYYAH3L5JH3RRVDMJANCNFSM4NBIMCQQ .

annerajb commented 4 years ago

FYI: Modifying SGX_DRIVER_URL=https://download.01.org/intel-sgx/dcap-1.2/linux/dcap_installers/ubuntuServer16.04/sgx_linux_x64_driver_1.12_c110012.bin to

SGX_DRIVER_URL=https://download.01.org/intel-sgx/sgx-dcap/1.6/linux/distro/ubuntuServer16.04/sgx_linux_x64_driver_1.33.bin or https://download.01.org/intel-sgx/sgx-dcap/1.3/linux/distro/ubuntuServer16.04/sgx_linux_x64_driver_1.13.bin Allow it to compile fine

annerajb commented 4 years ago

Made a pull request fixing script (i think) https://github.com/Azure/AgentBaker/pull/55

agowdamsft commented 4 years ago

@annerajb we made an announcement in MS Build today for ACC Node Pools support on AKS . You can join our preview if you are interested by submitting this form https://aka.ms/accakspreview

ghost commented 4 years ago

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

agowdamsft commented 3 years ago

@sakthi-vetrivel this can be closed as confidential computing SKU's are now supported on AKS https://docs.microsoft.com/en-us/azure/confidential-computing/confidential-nodes-aks-overview