Closed ratnadeepb closed 3 years ago
Could you please provide logs with kubectl logs static-web --previous
?
A wild guess from my side would be that your Pod configuration lacks the required Marblerun information.
See the Deploy your service with Kubernetes guide we have, or the general Kubernetes integration one on how to integrate your application into the Kubernetes cluster properly.
But yeah, to be sure, logs would be great :)
The output is blank:
$ kubectl logs static-web --previous
$
I had already run:
marblerun namespace add default
Regarding the manifest, based on this discussion, my understanding is that I need to define the Makefile
, the manifest.template
and the Dockerfile
. Once the image is built, I can deploy the container and extract the enclave details to write the manifest.json
. I ran into the same issue trying to deploy the my container, which is when I attempted to deploy the runtime container by itself.
I attempted the same with a named namespace with the same result.
Oh, so this is a follow-up from the old discussion.
The Edgeless RT deploy container is not suitable for running a Graphene application by default. The changes we made in the Redis sample related to the Dockerfile
used the Edgeless RT deploy container as a base, but eventually for the Kubernetes deployment we use an image in which we build Graphene, build Redis and then define graphene-sgx
pointing to the Redis with our MarbleRun LibOS premain process as the entrypoint.
What I would recommend you to do, step-by-step, is to:
docker run
. You can use plain Ubuntu as a base for your Dockerfile
if you like, eventually it just needs the required SGX and Graphene components for building and running. You can use our graphene-redis Dockerfile as an orientation for this step (though we use the edgelessrt-deploy
container as base, which already contains a bunch of the SGX dependencies).These two steps have nothing do to with MarbleRun yet. They solely focus on Graphene and getting your application to run without any MarbleRun additions yet on a local machine, without any cluster & Kubernetes so far. Basically, we are recreating what GSC does internally.
When you got this running, you can already attempt if you can extract or retrieve the SGX signature values from your application within the container. When you completely automate the build process within the container (ideally with a signing key imported externally as a Docker secret), you might even get this during build time of your Docker container. If you want to retrieve them after you build the container, you can refer to @m1ghtym0's comments on copying the .sig
file and using graphene-sgx-get-token
to achieve this afterwards.
If these steps worked for you and you were able to retrieve the SGX signature values, then I would recommend you to continue adapting this application to run with MarbleRun & Kubernetes.
manifest.template
. Alternatively, you can use the same MarbleRun CLI tool which you used for namespace adding to perform this changes (unless Graphene's recent changes have broken something...)manifest.json
with the retrieved SGX signature values, and deploy it to the MarbleRun coordinator with the CLI tool.marblerun/marbletype
).Hopefully, this is somewhat correct, I just wrote it down from how I would do it. I would recommend you to do this step-by-step, and if you hang somewhere, please tell us at which point exactly (e.g. your application runs with Graphene, you build a Docker image which launches your Graphene application, but you cannot get it working as a Marble) this fails.
Otherwise, with a lack of logs and a lack of source code, it's a bit hard to give exact advise on what is going wrong. The more details you can provide us, we better we can try to help!
I am actually slightly confused now. I used the Dockerfile in the redis example to build from. So would that not work?
If I use Ubuntu instead of edgelessrt-deploy
, I'd have to build SGX capabilities inside the container. Do you have any pointers on how to do that? Should I just follow Intel's linux-sgx
repo documentation for a docker build?
Could you share with us the Dockerfile you are using to build your image? This would be very helpful in identifying the issue.
Regarding edgelessrt-deploy
: This image is based on ubuntu 18.04 with Edgeless RT and most of the Intel SGX libraries preinstalled. You can find the source-code here: https://github.com/edgelesssys/edgelessrt/blob/master/Dockerfile
If you want to run a program using Graphene, you will need to install all of your programs and Graphene's dependencies.
Actually, I do not think it makes that much of a difference if you use edgelessrt-deploy
or just plain Ubuntu as the base. You might need to install a couple more libsgx packages, but apart from that you should be able to adapt most of the steps from line 13-32 to setup Graphene:
You might actually also give GSC a shot, though none of us has really tested this in practice because it always lacked behind in development for the most time... However, I do not see why it should not work.
I tried using gsc
for the build. However, the signing fails. Couldn't figure out why! I raised the issue with graphene: https://github.com/oscarlab/graphene/issues/2636
In the meantime, I tried to build it from this Dockerfile:
FROM alpine/git:latest AS pull
RUN git clone https://github.com/edgelesssys/marblerun.git /premain
FROM ghcr.io/edgelesssys/edgelessrt-dev AS build-premain
COPY --from=pull /premain /premain
WORKDIR /premain/build
RUN cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
RUN make premain-libos
FROM ghcr.io/edgelesssys/edgelessrt-deploy:latest AS release
RUN apt-get update && apt-get install -y git meson build-essential autoconf gawk bison wget python3 libcurl4-openssl-dev \
python3-protobuf libprotobuf-c-dev protobuf-c-compiler python3-pip software-properties-common python3-click python3-jinja2
RUN wget -qO- https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add
RUN add-apt-repository 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu bionic main'
RUN apt-get install -y libsgx-quote-ex-dev libsgx-aesm-launch-plugin
RUN python3 -m pip install "toml>=0.10"
RUN python3 -m pip install --upgrade tensorflow
ENV TZ=America/New_York
RUN apt-get update && apt-get install -y \
libsm6 \
libxext6 \
libxrender-dev
RUN pip3 install \
keras==2.2.4 \
pillow \
matplotlib \
pandas \
xlrd \
openpyxl \
xlsxwriter \
imageio
RUN git clone https://github.com/intel/SGXDataCenterAttestationPrimitives.git /SGXDriver
WORKDIR /SGXDriver
RUN git reset --hard a93785f7d66527aa3bd331ba77b7993f3f9c729b
RUN git clone https://github.com/oscarlab/graphene.git /graphene
WORKDIR /graphene
RUN git reset --hard b37ac75efec0c1183fd42340ce2d3e04dcfb3388
RUN make ISGX_DRIVER_PATH=/SGXDriver/driver/linux/ SGX=1
RUN meson build -Ddirect=disabled -Dsgx=enabled
RUN ninja -C build
RUN ninja -C build install
COPY --from=build-premain /premain/build/premain-libos /graphene/Examples/training/
COPY dist_mnist.manifest.template /graphene/Examples/training/
RUN mkdir -p /graphene/Examples/training
COPY dist_mnist.py /graphene/Examples/training
COPY Makefile /graphene/Examples/training
COPY dist_mnist.manifest.template /graphene/Examples/training
COPY --from=build-premain /premain/build/premain-libos /graphene/Examples/training
# RUN apt install libnss-mdns python3-numpy python3-scipy9
RUN cd /graphene/Examples/training
ENV BUILD_TLS yes
RUN --mount=type=secret,id=signingkey,dst=/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem,required=true \
make clean && make SGX=1 PYTHONVERSION=python3.6 PYTHONDISTHOME=/usr/local/lib/python3.6/dist-packages/
ENTRYPOINT ["graphene-sgx", "/graphene/Examples/training/dist_mnist.py"]
It is built with DOCKER_BUILDKIT=1 docker build -t dist_mnist_manual -f Dockerfile-new --secret id=signingkey,src=enclave-key.pem .
Finally the deployment yaml:
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: kubedep
labels:
app.kubernetes.io/name: test
app.kubernetes.io/part-of: test
app.kubernetes.io/version: v1
marblerun/inject: enabled
marblerun/marbletype: test
spec:
containers:
- name: web
image: <acr>/dist_mnist_manual
ports:
- name: static-web
containerPort: 80
protocol: TCP
The pod fails to start:
~$ kubectl get pods -n kubedep
NAME READY STATUS RESTARTS AGE
test 0/1 CrashLoopBackOff 3 77s
~$ kubectl logs -n kubedep test
Invalid application path specified (/graphene/Examples/training/dist_mnist.py.manifest.sgx does not exist).
The path should point to application configuration files, so that they can be
found after appending corresponding extensions.
~$ kubectl logs -n kubedep test --previous
Invalid application path specified (/graphene/Examples/training/dist_mnist.py.manifest.sgx does not exist).
The path should point to application configuration files, so that they can be
found after appending corresponding extensions.
Well, if you call graphene-sgx /graphene/Examples/training/dist_mnist.py
to launch your application, Graphene tries to find the manifest under the same name + ".manifest.sgx".
However, your dist_mnist.manifest.template
misses the .py
suffix in the middle. So maybe rename this one and make sure your Makefile
does something like this:
graphene-sgx-sign -output dist_mnist.manifest.py.manifest.sgx --manifest dist_mnist.py.manifest.template --key /graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem
Additionally, I believe you cannot directly call a Python file to launch with graphene-sgx
. Now I do not know what your manifest contains and if you already call the MarbleRun premain-libos
or not. Now, if you can actually call the Python file directly as (post-premain) entry point... Actually, have not tested that one yet. Guess that depends on Graphene, however usually if you do this directly without the MarbleRun premain, this is not supported by Graphene AFAIK.
Ok. Any suggestions on how I can rewrite the Dockerfile?
Just rename the output of your graphene-sgx-sign
call in your Makefile
to match what's shown as expected in the error. If you don't have one, you should properly add one.
Regarding the entry point / python binary: not sure, try it out. Graphene has an Python example... properly best to start with that one.
Note that you likely also need to install the Python modules inside Graphene's environment (or pass through, though not sure if that would be a good idea).
From the manifest file:
# MARBLERUN: entrypoint must be premain-libos
libos.entrypoint = "premain-libos"
loader.argv0_override = "dist_mnist.py"
loader.insecure__use_host_env = 1
The content is not the issue (so far at least), it's just that Graphene cannot find the signed manifest file.
You just need to rename the (signed) signature file first, which happens in your Makefile
which I don't have, so I can't tell exactly what to change... But I told you above above what you likely should change (or include in your Makefile
, in case you don't have it).
That didn't work. As a further test, I tried running the python interpreter by changing the last line of the Dockerfile so:
ENTRYPOINT ["graphene-sgx", "python", "-c \"print('Hello World')\""]
The Makefile I am using is https://github.com/oscarlab/graphene/blob/master/Examples/python/Makefile
and I am getting similar errors:
~$ kubectl logs -n kubedep test
Invalid application path specified (python.manifest.sgx does not exist).
The path should point to application configuration files, so that they can be
found after appending corresponding extensions.
Well... Graphene cannot find python.manifest.sgx
in the current working directory of the Docker environment.
It's properly better to use an absolute path, or to specify WORKDIR
before defining ENTRYPOINT
.
Then if you do this, make sure WORKDIR
actually contains python.manifest.sgx
, which should be generated from python.manifest.template
after calling graphene-sgx-sign
onto it (which the Makefile
you linked actually does).
Honestly, these are pretty basic mistakes. Graphene just cannot find the manifest file derived from the name of your entry point.
Just to give you an idea on where Graphene searches for the manifest file:
$ mkdir emptydir && cd emptydir
$ graphene-sgx python
Invalid application path specified (python.manifest.sgx does not exist).
The path should point to application configuration files, so that they can be
found after appending corresponding extensions.
$ touch python.manifest.sgx
$ graphene-sgx python
error: Enclave size not a power of two (an SGX-imposed requirement)
error: Parsing manifest failed
error: load_enclave() failed with error -22
I really recommend you to go through this step-by-step on a local or virtual machine instead before throwing it into a Dockerfile. This helps you to evaluate if your application actually works with Graphene, how the folder layout needs to look like, what to put into the manifest, what to use as ENTRYPOINT
when eventually defining the Dockerfile. etc. Right now you are tweaking too many things at once, without even getting anything to launch. It might be painful to go this way, so please do it step-by-step as I listed above.
If you actually get something to launch with Graphene, whether it's failing or not, that would be a step forward to help you understanding what you are doing and get your project running. So please, don't tweak too many things at once :)
Seems like the issue was with the how I had written the graphene manifest template and overall how I was running things. I corrected the manifest. But now I am building for Python3 and defining entrypoint as:
WORKDIR /graphene/Examples/training
ENTRYPOINT ["graphene-sgx", "python", "dist_mnist.py"]
The infrastructure is a Kubernetes cluster service on SGX enabled servers on Azure. Trying to run sudo graphene-sgx python dist_mnist.py
throws:
$ sudo graphene-sgx python dist_mnist.py
error: ECREATE failed in enclave creation ioctl (errno = -22)
error: Creating enclave failed: -22
error: load_enclave() failed with error -22
Deploying the Docker image:
$ kubectl logs -n kubedep test
error: Cannot open device /dev/sgx/enclave. Please make sure the Intel SGX kernel module is loaded.
error: load_enclave() failed with error -2
$ ~/graphene/Pal/src/host/Linux-SGX/tools/is-sgx-available/is_sgx_available
SGX supported by CPU: true
SGX1 (ECREATE, EENTER, ...): true
SGX2 (EAUG, EACCEPT, EMODPR, ...): false
Flexible Launch Control (IA32_SGXPUBKEYHASH{0..3} MSRs): true
SGX extensions for virtualizers (EINCVIRTCHILD, EDECVIRTCHILD, ESETCONTEXT): false
Extensions for concurrent memory management (ETRACKC, ELDBC, ELDUC, ERDINFO): false
CET enclave attributes support (See Table 37-5 in the SDM): false
Key separation and sharing (KSS) support (CONFIGID, CONFIGSVN, ISVEXTPRODID, ISVFAMILYID report fields): false
Max enclave size (32-bit): 0x80000000
Max enclave size (64-bit): 0x1000000000
EPC size: 0x3800000
SGX driver loaded: true
AESMD installed: true
SGX PSW/libsgx installed: false
The infrastructure is a Kubernetes cluster service on SGX enabled servers on Azure. Trying to run
sudo graphene-sgx python dist_mnist.py
throws:$ sudo graphene-sgx python dist_mnist.py error: ECREATE failed in enclave creation ioctl (errno = -22) error: Creating enclave failed: -22 error: load_enclave() failed with error -22
I assume this step is run in a docker container? If so, was the image built on an SGX capable machine? In my experience this issue occurs when you try to build a docker image with Graphene-SGX on a non SGX capable machine.
Deploying the Docker image:
$ kubectl logs -n kubedep test error: Cannot open device /dev/sgx/enclave. Please make sure the Intel SGX kernel module is loaded. error: load_enclave() failed with error -2
Does your cluster have an SGX device plugin installed? If it does, does your pod have the necessary resource request to make use of the plugin? E.g. if you are using the Intel SGX Plugin your pod will need something similar to:
resources: limits: sgx.intel.com/enclave: 1 sgx.intel.com/epc: 10Mi sgx.intel.com/provision: 1
I assume this step is run in a docker container?
If so, was the image built on an SGX capable machine?
In my experience this issue occurs when you try to build a docker image with Graphene-SGX on a non SGX capable machine.
This step was on one of the SGX enabled AKS nodes. Built on the same one too.
I am using the Azure ones instead of Intel: https://github.com/Azure/aks-engine/blob/master/docs/topics/sgx.md#deploying-the-sgx-device-plugin.
The kernel on the node is 5.4.
I am using the Azure ones instead of Intel: https://github.com/Azure/aks-engine/blob/master/docs/topics/sgx.md#deploying-the-sgx-device-plugin.
In that case your pods should request epc using azures plugin:
apiVersion: v1
kind: Pod
metadata:
name: <pod_name>
spec:
containers:
- name: <container_name>
image: <your_image>
resources:
limits:
kubernetes.azure.com/sgx_epc_mem_in_MiB: 10
As for Graphene not working, have you tried running your code outside of the docker environment? I.e installed Graphene on your machine directly and managed to get any of their examples, or your code, running? If that is not the case, something is wrong with your installation or setup, and I would suggested raising and issue over at Graphene directly, as they are much more experienced with the project and can probably provide much better help.
good point. thanks for that. I should have thought of that myself. anyhow, seems the issue was with the manifest file. running the graphene examples pointed me in the right direction. thanks for all the help @daniel-weisse and @Nirusu!
@ratnadeepb , can you share a working manifest for your case. I'm experiencing the same problems.. Thanks!
hey @rguikers, my apologies for the late reply. I was doing this during an internship over the summer. I don't have access to that anymore. I am so sorry about that.
hey @rguikers, my apologies for the late reply. I was doing this during an internship over the summer. I don't have access to that anymore. I am so sorry about that.
No problem, thanks..
Issue description
Deploying the edgeless container runtime. I was trying to deploy my code in a container and it was continuously restarting. So I attempted to deploy
ghcr.io/edgelesssys/edgelessrt-deploy:latest
. It exhibits the same behavior.To reproduce
Steps to reproduce the behavior:
kubectl apply -f pod-test.yaml
kubectl describe pods static-web
Expected behavior
Environment:
Additional info / screenshots