edgelesssys / marblerun

MarbleRun is the control plane for confidential computing. Deploy, scale, and verify your confidential microservices on vanilla Kubernetes. 100% Go, 100% cloud native, 100% confidential.
https://marblerun.sh
Other
237 stars 34 forks source link

occlum_hello example cannot be deployed. #310

Closed llnut closed 2 years ago

llnut commented 2 years ago

Issue description

The following error message is displayed when deploying the occlum_hello example:

cd occlum_instance; occlum run /bin/premain-libos
[PreMain] 2022/07/26 13:02:06 detected libOS: Occlum
[PreMain] 2022/07/26 13:02:06 starting PreMain
[PreMain] 2022/07/26 13:02:06 fetching env variables
[PreMain] 2022/07/26 13:02:06 loading TLS Credentials
[PreMain] 2022/07/26 13:02:06 loading UUID
[PreMain] 2022/07/26 13:02:06 found UUID: 89dde1b8-483a-4351-b752-b076b5188d77
[PreMain] 2022/07/26 13:02:06 generating CSR
[PreMain] 2022/07/26 13:02:06 generating quote
[PreMain] 2022/07/26 13:02:06 activating marble of type hello
panic: rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type "text/plain; charset=utf-8"

goroutine 1 [running]:
main.prepareOcclum({0x7f264a210bf0?, 0x7f264a765c70?})
        github.com/edgelesssys/marblerun/cmd/premain-libos/main.go:113 +0x1e6
main.main()
        github.com/edgelesssys/marblerun/cmd/premain-libos/main.go:57 +0x9c
make: *** [Makefile:27: run] Error 2

To reproduce

Steps to reproduce the behavior:

  1. Build coordinator and marble-injector docker image using the latest source code.

    openssl genrsa -out private.pem -3 3072
    DOCKER_BUILDKIT=1 docker build --secret id=signingkey,src=private.pem --target release --tag ghcr.io/edgelesssys/coordinator -f dockerfiles/Dockerfile.coordinator .
    DOCKER_BUILDKIT=1 docker build --tag ghcr.io/edgelesssys/marble-injector -f dockerfiles/Dockerfile.marble-injector .
  2. Start coordinator and marble-injector

    dev@minikube:/marblerun-dev$ docker-compose up -d
    [+] Running 2/2
    ⠿ Container marblerun-injector-1     Started
    ⠿ Container marblerun-coordinator-1  Started
  3. In occlum container, verify the quote and get the coordinator's root certificate.

    root@67cf52266207:~/sgx/occlum-hello# marblerun certificate root -o marblerun.crt coordinator:4433 --era-config era-config.json
    Root certificate written to marblerun.crt
  4. Set the manifest.

    root@67cf52266207:~/sgx/occlum-hello# marblerun manifest set manifest.json coordinator:4433 --era-config era-config.json
    Successfully verified Coordinator, now uploading manifest
    Manifest signature: b8808db262032ecfbb6f05e44a0c88389c8a41ad9ff34f1e57fc0388a101251d
    Manifest successfully set
  5. Start the occlum service

    root@67cf52266207:~/sgx/occlum-hello# make run
    cd occlum_instance; occlum run /bin/premain-libos
    [PreMain] 2022/07/26 13:10:32 detected libOS: Occlum
    [PreMain] 2022/07/26 13:10:32 starting PreMain
    [PreMain] 2022/07/26 13:10:32 fetching env variables
    [PreMain] 2022/07/26 13:10:32 loading TLS Credentials
    [PreMain] 2022/07/26 13:10:32 loading UUID
    [PreMain] 2022/07/26 13:10:32 found UUID: 89dde1b8-483a-4351-b752-b076b5188d77
    [PreMain] 2022/07/26 13:10:32 generating CSR
    [PreMain] 2022/07/26 13:10:32 generating quote
    [PreMain] 2022/07/26 13:10:32 activating marble of type hello
    panic: rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type "text/plain; charset=utf-8"
    goroutine 1 [running]:
    main.prepareOcclum({0x7f754a210bf0?, 0x7f754a765c70?})
        github.com/edgelesssys/marblerun/cmd/premain-libos/main.go:113 +0x1e6
    main.main()
        github.com/edgelesssys/marblerun/cmd/premain-libos/main.go:57 +0x9c
    make: *** [Makefile:27: run] Error 2

I also tried it with Minikube, but the error was the same as above.

Expected behavior

Start occlum_hello successfully

Environment

Additional info / screenshots

marblerun-coordinator-1  | [erthost] loading enclave ...
marblerun-coordinator-1  | [erthost] entering enclave ...
marblerun-coordinator-1  | [meshentry] invoking premain
marblerun-coordinator-1  | [meshentry] invoking main
marblerun-coordinator-1  | 2022-07-26T13:08:32.386Z        INFO    coordinator/run.go:53   starting coordinator    {"version": "0.6.0", "commit": "a48368cdd50a7face66f05e55f3b6a7e46bcabe1"}
marblerun-coordinator-1  | 2022-07-26T13:08:32.386Z        INFO    coordinator/run.go:84   creating the Core object
marblerun-coordinator-1  | 2022-07-26T13:08:32.386Z        INFO    core/core.go:137        loading state
marblerun-coordinator-1  | 2022-07-26T13:08:32.386Z        INFO    core/core.go:175        No sealed state found. Proceeding with new state.
marblerun-coordinator-1  | 2022-07-26T13:08:32.394Z        INFO    core/core.go:331        generating quote
marblerun-coordinator-1  | 2022-07-26T13:08:32.650Z        INFO    coordinator/run.go:108  starting the client server
marblerun-coordinator-1  | 2022-07-26T13:08:32.650Z        INFO    coordinator/run.go:117  starting the marble server
marblerun-coordinator-1  | 2022-07-26T13:08:32.650Z        INFO    server/server.go:110    starting client https server    {"address": "coordinator:4433"}
marblerun-coordinator-1  | 2022-07-26T13:08:32.654Z        INFO    zap/grpclogger.go:92    [core][Server #1] Server created        {"system": "grpc", "grpc_log": true}
marblerun-coordinator-1  | 2022-07-26T13:08:32.654Z        INFO    zap/grpclogger.go:92    [core][Server #1 ListenSocket #2] ListenSocket created  {"system": "grpc", "grpc_log": true}
marblerun-coordinator-1  | 2022-07-26T13:08:32.654Z        INFO    coordinator/run.go:129  started gRPC server     {"grpcAddr": "172.18.0.5:2001"}
marblerun-coordinator-1  | 172.18.0.2 - - [26/Jul/2022:13:09:30 +0000] "GET /quote HTTP/1.1" 200 7897
marblerun-coordinator-1  | 172.18.0.2 - - [26/Jul/2022:13:09:30 +0000] "POST /manifest HTTP/1.1" 200 33
marblerun-coordinator-1  | 172.18.0.2 - - [26/Jul/2022:13:10:32 +0000] "POST /rpc.Marble/Activate HTTP/2.0" 404 19
version: "3.7"

services:
  coordinator:
    image: ghcr.io/edgelesssys/coordinator:latest
    ports:
      - 4433:4433
    devices:
      - /dev/sgx_enclave:/dev/sgx/enclave
      - /dev/sgx_provision:/dev/sgx/provision
    environment:
      - DCAP_LIBRARY=intel
      - OE_SIMULATION=0
      - EDG_COORDINATOR_DEV_MODE=1
      - EDG_COORDINATOR_MESH_ADDR=coordinator:2001
      - EDG_COORDINATOR_CLIENT_ADDR=coordinator:4433
      - EDG_COORDINATOR_DNS_NAMES=coordinator,127.0.0.1
      - EDG_COORDINATOR_SEAL_DIR=/root/sgx/marblerun_seal_dir
    volumes:
      - ./patch/sgx_default_qcnl.conf:/etc/sgx_default_qcnl.conf
      - ../../../sgx:/root/sgx
    networks:
      - network

  injector:
    image: ghcr.io/edgelesssys/marble-injector:latest
    devices:
      - /dev/sgx_enclave:/dev/sgx/enclave
      - /dev/sgx_provision:/dev/sgx/provision
    environment:
      - DCAP_LIBRARY=intel
      - OE_SIMULATION=0
      - EDG_COORDINATOR_DEV_MODE=1
      - EDG_COORDINATOR_MESH_ADDR=coordinator:2001
      - EDG_COORDINATOR_CLIENT_ADDR=coordinator:4433
      - EDG_COORDINATOR_DNS_NAMES=coordinator,127.0.0.1
      - EDG_COORDINATOR_SEAL_DIR=/root/sgx/marblerun_seal_dir
    volumes:
      - ./patch/sgx_default_qcnl.conf:/etc/sgx_default_qcnl.conf
      - ../../../sgx:/root/sgx
    networks:
      - network

  occlum:
    image: occlum/occlum:0.28.0-ubuntu20.04
    devices:
      - /dev/sgx_enclave:/dev/sgx/enclave
      - /dev/sgx_provision:/dev/sgx/provision
    volumes:
      - ./occlum/patch/sgx_default_qcnl.conf:/etc/sgx_default_qcnl.conf
      - ../../../sgx:/root/sgx
    ports:
      - "5000:5000"
    networks:
      - libos

networks:
  network:
    external: true
    name: dev_network
{
    "Packages": {
        "world": {
            "Debug": true,
            "UniqueID": "dcca1a93adb4f6d9133555ab69465c844199f4e3b6fdda447d08050cc41d2af0"
        }
    },
    "Marbles": {
        "hello": {
            "Package": "world",
            "Parameters": {
                "Env": {
                    "ROOT_CA": "{{ pem .MarbleRun.RootCA.Cert }}",
                    "MARBLE_CERT": "{{ pem .MarbleRun.MarbleCert.Cert }}",
                    "MARBLE_KEY": "{{ pem .MarbleRun.MarbleCert.Private }}"
                },
                "Argv": [
                    "/bin/rest_api.py"
                ]
            }
        }
    }
}
"env": {
        "default": [
            "PYTHONHOME=/opt/python-occlum",
            "OCCLUM=yes",
            "EDG_MARBLE_COORDINATOR_ADDR=coordinator:4433",
            "EDG_MARBLE_TYPE=hello",
            "EDG_MARBLE_UUID_FILE=uuid",
            "EDG_MARBLE_DNS_NAMES=coordinator"
        ],
        "untrusted": [
            "EDG_MARBLE_COORDINATOR_ADDR",
            "EDG_MARBLE_TYPE",
            "EDG_MARBLE_UUID_FILE",
            "EDG_MARBLE_DNS_NAMES"
        ]
    }
daniel-weisse commented 2 years ago

Hi @jcsora , thanks for the detailed error report.

Your applications seems to be sending its activation request to the wrong port. Port 4433 is used for user interaction with the Coordinator, like setting manifest, retrieving attestation etc. Port 2001 is the one you should be trying to reach with you application.

Replacing the the 4433 with 2001 for EDG_MARBLE_COORDINATOR_ADDR in you occlum.json file should fix your issue:

{
//...
  "env": {
    "default": [
      "PYTHONHOME=/opt/python-occlum",
      "OCCLUM=yes",
      "EDG_MARBLE_COORDINATOR_ADDR=coordinator:2001", // <-- Replaced port number 
      "EDG_MARBLE_TYPE=hello",
      "EDG_MARBLE_UUID_FILE=uuid",
      "EDG_MARBLE_DNS_NAMES=coordinator"
    ],
//...
}
llnut commented 2 years ago

Thank you very much for your answer @daniel-weisse , the previous problem has been resolved. But I ran into a new problem, when deploying with marblerun:v0.6.0 and all occlum versions between v0.26.4-v0.28.0, I got the following error of coordinator:

marblerun-coordinator-1  | ERROR: rc = 0xffffde80
marblerun-coordinator-1  |  (oe_result_t=OE_CRYPTO_ERROR) [openenclave-src/enclave/crypto/mbedtls/crl.c:oe_crl_read_der:65]
marblerun-coordinator-1  | ERROR: Failed to read CRL. OE_CRYPTO_ERROR (oe_result_t=OE_CRYPTO_ERROR) [openenclave-src/common/sgx/collateral.c:oe_validate_revocation_list:385]
marblerun-coordinator-1  | ERROR: :OE_INVALID_PARAMETER [openenclave-src/enclave/crypto/mbedtls/crl.c:oe_crl_free:140]
marblerun-coordinator-1  | ERROR: Failed to validate revocation info. OE_CRYPTO_ERROR (oe_result_t=OE_CRYPTO_ERROR) [openenclave-src/common/sgx/quote.c:oe_get_sgx_quote_validity:776]
marblerun-coordinator-1  | ERROR: Failed to validate quote. OE_CRYPTO_ERROR (oe_result_t=OE_CRYPTO_ERROR) [openenclave-src/common/sgx/quote.c:oe_verify_quote_with_sgx_endorsements:631]
marblerun-coordinator-1  | 2022-07-28T03:07:57.386Z        INFO    zap/options.go:212      finished unary call with code Unauthenticated   {"grpc.start_time": "2022-07-28T03:07:57Z", "system": "grpc", "span.kind": "server", "grpc.service": "rpc.Marble", "grpc.method": "Activate", "peer.address": "172.18.0.8:49334", "error": "rpc error: code = Unauthenticated desc = invalid quote: verifying quote failed: OE_CRYPTO_ERROR", "grpc.code": "Unauthenticated", "grpc.time_ms": 324.005}
marblerun-coordinator-1  | 2022-07-28T03:07:57.394Z        INFO    zap/grpclogger.go:92    [transport]transport: loopyWriter.run returning. connection error: desc = "transport is closing"        {"system": "grpc", "grpc_log": true}

And when I deploying with marblerun:v0.5.1, the error message becomes the following:

panic: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake"

goroutine 1 [running]:
main.prepareOcclum(0x7f9e4a92ca98, 0x7f9e4ae4b438, 0x1, 0x7f9e64000180, 0x200000003, 0x7f9e64000180)
        /home/daniel/Edgeless/marblerun/cmd/premain-libos/main.go:113 +0x2da
main.main()
        /home/daniel/Edgeless/marblerun/cmd/premain-libos/main.go:57 +0x185
Makefile:27: recipe for target 'run' failed
make: *** [run] Error 2

When I deploying with marblerun:v0.5.1 and occlum:v0.24.1, the error message becomes the following:

cd occlum_instance; occlum run /bin/premain-libos
[PreMain] 2022/07/28 03:54:27 detected libOS: Occlum
[PreMain] 2022/07/28 03:54:27 starting PreMain
[PreMain] 2022/07/28 03:54:27 fetching env variables
[PreMain] 2022/07/28 03:54:27 loading TLS Credentials
[PreMain] 2022/07/28 03:54:27 loading UUID
[PreMain] 2022/07/28 03:54:27 UUID not found. Generating and storing a new UUID
[PreMain] 2022/07/28 03:54:27 generating CSR
[PreMain] 2022/07/28 03:54:27 generating quote
/opt/occlum/build/bin/occlum: line 315:   862 Segmentation fault      RUST_BACKTRACE=1 "$instance_dir/build/bin/occlum-run" "$@"
Makefile:27: recipe for target 'run' failed
make: *** [run] Error 139
daniel-weisse commented 2 years ago

This looks similar to another issue that was reported to us and is being investigated. As a workaround, can you try setting pccs_api_version to 3.1 in ./occlum/patch/sgx_default_qcnl.conf

llnut commented 2 years ago

Thank you for your patient answer @daniel-weisse , I updated the sgx_default_qcnl.conf, now the problem is solved.