Cosmian / mse-cli

MicroService Encryption CLI
Other
1 stars 0 forks source link

Error when trying to spawn docker image #66

Closed ADL-work closed 3 days ago

ADL-work commented 5 months ago

Hi,

I'm having error when trying to spawn docker from sgx operator

Error:

❌ Your application 'app_name' is not running. Run `mse home logs` for more details

Running mse home logs app_name shows this

Reading args: --size 4096M --subject CN=cosmian.com,O=Cosmian Tech,C=FR,L=Paris,ST=Ile-de-France --san myapp.fr --id c904008a-d4cd-45e6-a908-d637fc725bbc --application app:app --expiration 1743605697
Untar the code...
app.py
Generating the enclave...
rm -f *.token *.sig *.manifest.sgx *.manifest
gramine-manifest \
    -Dlog_level=error \
    -Darch_libdir=/lib/x86_64-linux-gnu \
    -Denclave_size=4096M \
    -Dapp_dir=/opt/input/app \
    -Dhome_dir=home \
    -Dkey_dir=key \
    -Dcode_dir=code \
    -Dentrypoint=/usr/bin/python3.10 \
    python.manifest.template > python.manifest
gramine-sgx-sign \
    --key /root/.config/gramine/enclave-key.pem \
    --output python.manifest.sgx \
    --manifest python.manifest
Usage: gramine-sgx-sign [OPTIONS]
Try 'gramine-sgx-sign --help' for help.

Error: Invalid value for '--key' / '-k': File '/root/.config/gramine/enclave-key.pem' is a directory.
make: *** [Makefile:36: sgx_outputs] Error 2

In the SGX operator I've installed SGX Driver, DCAP, enabling local host PCCS, test run with the Local Attestation passed.

Maybe I am missing something in the middle ?

Thanks for your help

grydz commented 5 months ago

Hi,

Do you have generated the signer key enclave-key.pem?

If not, just run:

$ mkdir -p $HOME/.config/gramine
$ openssl genrsa -3 -out $HOME/.config/gramine/enclave-key.pem 3072

Default path is $HOME/.config/gramine/enclave-key.pem but it can be changed with --signer-key argument of mse home spawn if needed.

ADL-work commented 5 months ago

Hi,

Do you have generated the signer key enclave-key.pem?

If not, just run:

$ mkdir -p $HOME/.config/gramine
$ openssl genrsa -3 -out $HOME/.config/gramine/enclave-key.pem 3072

Default path is $HOME/.config/gramine/enclave-key.pem but it can be changed with --signer-key argument of mse home spawn if needed.

@grydz Thanks for your hint !

Apparently the docker app can be loaded with gramine, but we have the timed out issue

Extracting the package at /home/azureuser/sgx_operator...
Loading the docker image...
Starting the docker...
Waiting for the configuration server to be ready...  
The application is now ready to receive the secrets!
Collecting the enclave and application evidences...
❌ timed out

Checking the mse logs it just hangs at this line

    00000000fa491000-00000000fa4e1000 [REG:R-X] (code) measured
    00000000fa4e1000-00000000fa4eb000 [REG:RW-] (data) measured
    0000000000010000-00000000fa491000 [REG:RWX] (free)
Measurement:
    0e299a0dbf2c66cfd0ea5750e6c8d08f8e2bfa3543e44efe3d7e482666d43c7b
Gramine is starting. Parsing TOML manifest file, this may take some time...
[2024-04-08 15:53:44,573] [INFO] Generating self-signed certificate...
[2024-04-08 15:53:44,582] [INFO] Starting the configuration server...
172.17.0.1 - - [08/Apr/2024 15:53:46] "GET / HTTP/1.1" 200 -

Do you have any hint for this ?

Thank you

grydz commented 5 months ago

It looks like the configuration server is timeout because you did not provide the code secret key through mse home run command.

Could you post the commands you've typed?

ADL-work commented 4 months ago

It looks like the configuration server is timeout because you did not provide the code secret key through mse home run command.

Could you post the commands you've typed?

hi @grydz , I tried again with the full stack trace enabled, here are from the logs:

Extracting the package at /home/azureuser/sgx_operator...
Loading the docker image...
Starting the docker...
Waiting for the configuration server to be ready...  
The application is now ready to receive the secrets!
Collecting the enclave and application evidences...
Traceback (most recent call last):
  File "/home/azureuser/.pyenv/versions/3.8.18/lib/python3.8/site-packages/mse_cli/main.py", line 124, in main
    func(args)
  File "/home/azureuser/.pyenv/versions/3.8.18/lib/python3.8/site-packages/mse_cli/home/command/sgx_operator/spawn.py", line 191, in run
    collect_evidence_and_certificate(container, args.pccs, args.output)
  File "/home/azureuser/.pyenv/versions/3.8.18/lib/python3.8/site-packages/mse_cli/home/command/sgx_operator/evidence.py", line 83, in collect_evidence_and_certificate
    get_server_certificate((docker.host, docker.port)).encode("utf-8")
  File "/home/azureuser/.pyenv/versions/3.8.18/lib/python3.8/site-packages/intel_sgx_ra/ratls.py", line 80, in get_server_certificate
    with socket.create_connection((host, port), timeout=10) as sock:
  File "/home/azureuser/.pyenv/versions/3.8.18/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/home/azureuser/.pyenv/versions/3.8.18/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
socket.timeout: timed out
❌ timed out

the commands I typed:

grydz commented 4 months ago

Do you use Intel SGX on Microsoft Azure or on your own infrastructure?

If you're using Microsoft Azure, we will release a beta version of the CLI for testing because Microsoft has its own specific DCAP infrastructure.

If not, then it seems your Intel PCCS server is not responding. Check the configuration on the SGX server in /etc/sgx_default_qcnl.conf and be sure you can reach pccs_url.

ADL-work commented 4 months ago

Yes we are using SGX VM from Azure, followed the installation from here and replaced the default qnl with az-dcap-client package.

Here's what i've configured in the /etc/sgx_default_qcnl.conf

{
  // *** ATTENTION : This file is in JSON format so the keys are case sensitive. Don't change them.

  //PCCS server address
  "pccs_url": "https://localhost:8081/sgx/certification/v4/"

  // To accept insecure HTTPS certificate, set this option to false
  ,"use_secure_cert": false 

  // You can use the Intel PCS or another PCCS to get quote verification collateral.  Retrieval of PCK 
  // Certificates will always use the PCCS described in pccs_url.  When collateral_service is not defined, both 
  // PCK Certs and verification collateral will be retrieved using pccs_url  
  //,"collateral_service": "https://api.trustedservices.intel.com/sgx/certification/v4/"

  // If you use a PCCS service to get the quote verification collateral, you can specify which PCCS API version is to be used.
  // The legacy 3.0 API will return CRLs in HEX encoded DER format and the sgx_ql_qve_collateral_t.version will be set to 3.0, while
  // the new 3.1 API will return raw DER format and the sgx_ql_qve_collateral_t.version will be set to 3.1. The pccs_api_version 
  // setting is ignored if collateral_service is set to the Intel PCS. In this case, the pccs_api_version is forced to be 3.1 
  // internally.  Currently, only values of 3.0 and 3.1 are valid.  Note, if you set this to 3.1, the PCCS use to retrieve 
  // verification collateral must support the new 3.1 APIs.
  //,"pccs_api_version": "3.1"

  // Maximum retry times for QCNL. If RETRY is not defined or set to 0, no retry will be performed.
  // It will first wait one second and then for all forthcoming retries it will double the waiting time.
  // By using retry_delay you disable this exponential backoff algorithm
  ,"retry_times": 6

  // Sleep this amount of seconds before each retry when a transfer has failed with a transient error
  ,"retry_delay": 10

  // If local_pck_url is defined, the QCNL will try to retrieve PCK cert chain from local_pck_url first,
  // and failover to pccs_url as in legacy mode.
  //,"local_pck_url": "http://localhost:8081/sgx/certification/v4/"

  // If local_pck_url is not defined, set pck_cache_expire_hours to a none-zero value will enable local cache. 
  // The PCK certificates will be cached in memory and then to the disk drive. 
  // The local cache files will be sequentially searched in the following directories until located in one of them:
  // Linux : $AZDCAP_CACHE, $XDG_CACHE_HOME, $HOME, $TMPDIR, /tmp/
  // Windows : $AZDCAP_CACHE, $LOCALAPPDATA\..\..\LocalLow
  // Please be aware that the environment variable pertains to the account executing the process that loads QPL,
  // not the account used to log in. For instance, if QPL is loaded by QGS, then those environment variables relate to
  // the "qgsd" account, which is the account that runs the QGS daemon.
  // You can remove the local cache files either manually or by using the QPL API, sgx_qpl_clear_cache. If you opt to
  // delete them manually, navigate to the aforementioned caching directories, find the folder named .dcap-qcnl, and delete it.
  // Restart the service after all cache folders were deleted. The same method applies to "verify_collateral_cache_expire_hours"
  ,"pck_cache_expire_hours": 168

  // To set cache expire time for quote verification collateral in hours
  // See the above comment for pck_cache_expire_hours for more information on the local cache.
  ,"verify_collateral_cache_expire_hours": 168

  // When the "local_cache_only" parameter is set to true, the QPL/QCNL will exclusively use PCK certificates 
  // from local cache files and will not request any PCK certificates from service providers, whether local or remote. 
  // To ensure that the PCK cache is available for use, an administrator must pre-populate the cache folders with 
  // the appropriate cache files. To generate these cache files for specific platforms, the administrator can use 
  // the PCCS admin tool. Once the cache files are generated, the administrator must distribute them to each platform 
  // that requires provisioning.
  ,"local_cache_only": false

  // You can add custom request headers and parameters to the get certificate API.
  // But the default PCCS implementation just ignores them. 
  //,"custom_request_options" : {
  //  "get_cert" : {
  //    "headers": {
  //      "head1": "value1"
  //    },
  //    "params": {
  //      "param1": "value1",
  //      "param2": "value2"
  //    }
  //  }
  //}
}

The request curl -v -k -G https://127.0.0.1:8081/sgx/certification/v4/rootcacrl (also works for v3) managed to received some data like this 308201223081c8020101300a06082a8648ce3d0403023068311a301806035504030c11496e74656c2053475820526f6f74204341311a3018060355040a0c11496e74656c20436f72706f726174696f6e3114301206035504070c0b53616e746120436c617261310b300906035504080c024341310b3009060355040613025553170d3...

PCCS and AESMD services are running, and I opened the port of PCCS (8081) for incoming request (even though it's not needed for now as all requests are from local) Do you have any hint to verify if PCCS is working ? Thank you

ADL-work commented 4 months ago

@grydz Could you give me some details about how the get evidence when running the mse home evidence command ? I see it connect to get the intel_sgx_ra but at which url and which port ? This may help me to understand where's the issue and resolve. Thank you

grydz commented 3 days ago

We have released a simple Ansible script for Azure: https://github.com/Cosmian/mse-home-on-azure.

I'm closing the issue but feel free to open a new one if needed.