edgelesssys / marblerun

MarbleRun is the control plane for confidential computing. Deploy, scale, and verify your confidential microservices on vanilla Kubernetes. 100% Go, 100% cloud native, 100% confidential.
https://marblerun.sh
Other
238 stars 34 forks source link

samples/gramine-hello fails occasionally #343

Open JaewonHur opened 1 year ago

JaewonHur commented 1 year ago

Issue description

I'm trying to run samples/gramine-hello with edgelessrt v0.3.6 and go 1.19.4, and marblerun 6e0a0. I correctly followed the README, but the test occasionally fails while running grpc. Below is the stack trace when bug occurs.

Environment

Additional info / screenshots

[PreMain] 2023/01/03 06:47:35 detected libOS: Gramine
[PreMain] 2023/01/03 06:47:35 starting PreMain
[PreMain] 2023/01/03 06:47:35 fetching env variables
[PreMain] 2023/01/03 06:47:35 loading TLS Credentials
[PreMain] 2023/01/03 06:47:35 loading UUID
[PreMain] 2023/01/03 06:47:35 found UUID: 79dbb97e-da24-4072-bbae-68b10d5baa57
[PreMain] 2023/01/03 06:47:35 generating CSR
[PreMain] 2023/01/03 06:47:35 generating quote
Detected deprecated syntax 'sgx.remote_attestation = true|false'; consider using 'sgx.remote_attestation = "none"|"epid"|"dcap"'.
[PreMain] 2023/01/03 06:47:35 activating marble of type hello
[PreMain] 2023/01/03 06:47:35 creating files from manifest
[PreMain] 2023/01/03 06:47:35 setting env vars from manifest
[PreMain] 2023/01/03 06:47:35 done with PreMain
[PreMain] 2023/01/03 06:47:41 detected libOS: Gramine
[PreMain] 2023/01/03 06:47:41 starting PreMain
[PreMain] 2023/01/03 06:47:41 fetching env variables
[PreMain] 2023/01/03 06:47:42 loading TLS Credentials
[PreMain] 2023/01/03 06:47:42 loading UUID
[PreMain] 2023/01/03 06:47:42 found UUID: 79dbb97e-da24-4072-bbae-68b10d5baa57
[PreMain] 2023/01/03 06:47:42 generating CSR
[PreMain] 2023/01/03 06:47:42 generating quote
Detected deprecated syntax 'sgx.remote_attestation = true|false'; consider using 'sgx.remote_attestation = "none"|"epid"|"dcap"'.
[PreMain] 2023/01/03 06:47:42 activating marble of type hello
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x67c63929]

goroutine 12 [running]:
google.golang.org/grpc/internal/transport.(*loopyWriter).processData(0x50275860)
    /root/go/pkg/mod/google.golang.org/grpc@v1.49.0/internal/transport/controlbuf.go:955 +0x509
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0x50275860)
    /root/go/pkg/mod/google.golang.org/grpc@v1.49.0/internal/transport/controlbuf.go:556 +0x198
google.golang.org/grpc/internal/transport.newHTTP2Client.func3()
    /root/go/pkg/mod/google.golang.org/grpc@v1.49.0/internal/transport/http2_client.go:417 +0x65
created by google.golang.org/grpc/internal/transport.newHTTP2Client
    /root/go/pkg/mod/google.golang.org/grpc@v1.49.0/internal/transport/http2_client.go:415 +0x1eb1
m1ghtym0 commented 1 year ago

Hi @JaewonHur, thank you for reporting this.

What Gramine version is this running with, and did you use marblerun gramine-prepare? Can you attach the MarbleRun manifest and the Gramine manifest?

Thank you!

Nirusu commented 1 year ago

In addition, debug / trace logs from Gramine (loader.log_level = 'trace' in the Gramine manifest) could be helpful to track this down in case we cannot reproduce this issue on our own.

JaewonHur commented 1 year ago

I used gramine v1.3.1 built from source, so I didn't run marblerun gramine-prepare. Is it must to run marblerun gramine-prepare?

For MarbleRun manifest and Gramine manifest, I used the ones in samples/gramine-hello directory without any modification.

Below is the gramine log. Here is gramine-log.txt.

gramine-log.txt

m1ghtym0 commented 1 year ago

Thank you. If you use the samples/gramine-hello you don't need the gramine-prepare command. It's just for automatically adjusting the Gramine manifest for a Marble. Can you confirm that the problem does not occur with gramine v1.0?

We'll investigate if there is a problem with with gramine v1.3.

JaewonHur commented 1 year ago

for gramine v1.0 and v1.2, no bug triggered in 30 trials. in gramine v1.3.1, bug is triggered in about 5 trials.

daniel-weisse commented 1 year ago

Hey there,

I am currently unable to replicate this on Gramine v1.3.1 (I ran ~100 tests without seeing any errors). One thing that looks a bit suspicious is how PreMain runs twice, this should not happen, but might be unrelated to your issue.

Could you share some details about your configuration? Could this be memory/resource issue on your end? Do similar issues also occur when running standalone Gramine applications? Also, your log seems to be missing stdout messages, could you try creating a new one, e.g. using the following command:

EDG_MARBLE_TYPE=hello make run &> gramine-log.txt
JaewonHur commented 1 year ago

Sorry for the confusion. I generated the gramine-log.txt above after modifying helloworld.manifest.template. I removed loader.arg0_override = "hello", so it may have affected to invoke PreMain twice. (but all others were the same)

I tried to reproduce the bug several times, but the bug is so non-deterministic, so I could not get any clue of the root cause. For now, the bug was not triggered in 100 trials (with and without loader.arg0_override = "hello").

The stdout message is the same as the first comment above. Let me report if the bug is reproduced once again.

daniel-weisse commented 1 year ago

Took another look at the double running PreMain issue. This only seems to happen on Gramine v1.3, and seems to be caused by their removal of the loader.argv0_override option. The correct way is now to set the argv0 override as part of loader.argv, as specified in the gramine docs.

Updating loader.argv0_override = "hello" to loader.argv = ["hello"] fixes the issue of running the PreMain twice.

The following gramine manifest should fix any deprecation warnings you see when running with Gramine v1.3

loader.entrypoint = "file:{{ gramine.libos }}"
loader.env.LD_LIBRARY_PATH = "/lib"

# entrypoint must be premain-libos
libos.entrypoint = "premain-libos"

# argv0 must be the path to the actual application
loader.argv = ["hello"]

# Forward EDG environment variables, used by MarbleRun
loader.env.EDG_MARBLE_TYPE = { passthrough = true }
loader.env.EDG_MARBLE_COORDINATOR_ADDR = { passthrough = true }
loader.env.EDG_MARBLE_UUID_FILE = { passthrough = true }
loader.env.EDG_MARBLE_DNS_NAMES = { passthrough = true }

# FS mount points
fs.mounts = [
    { path = "/lib", uri = "file:{{ gramine.runtimedir() }}" },
    { path = "/etc", uri = "file:/etc" },
]

# trusted files
sgx.trusted_files = [
    "file:{{ gramine.runtimedir() }}/libnss_dns.so.2",
    "file:{{ gramine.runtimedir() }}/libnss_files.so.2",
    "file:{{ gramine.runtimedir() }}/libresolv.so.2",
    "file:{{ gramine.runtimedir() }}/ld-linux-x86-64.so.2",
    "file:{{ gramine.runtimedir() }}/libc.so.6",
    "file:{{ gramine.runtimedir() }}/libpthread.so.0",
    "file:{{ gramine.libos }}",
    "file:premain-libos",
    "file:hello"
]

# allowed files
sgx.allowed_files = [
    "file:/etc/hosts",
    "file:/etc/host.conf",
    "file:/etc/gai.conf",
    "file:/etc/resolv.conf",
    "file:/etc/localtime",
    "file:/etc/nsswitch.conf",
    "file:uuid"
]

# enable DCAP
sgx.remote_attestation = "dcap"

# enclave must have enough memory and threads
sgx.enclave_size = "1024M"
sgx.thread_num = 16

# create a debug enclave by default
sgx.debug = true