fdo-rs / fdo-operator

An OpenShift operator for managing FIDO Device Onboard (FDO) servers
Apache License 2.0
4 stars 2 forks source link

Manufacturing-Server: Session Storage Error and Device Credential Missing #118

Closed morvencao closed 10 months ago

morvencao commented 11 months ago

I am deploying FDO services in an OpenShift cluster using the fdo-operator, and the fdo-servers are operational. However, upon booting my device (simulated as a VM using libvirt), I encounter an issue where no device credentials are found.

Upon inspecting the device system logs, I observed the following entries:

image

Additionally, the manufacturing-server logs indicate the occurrence of the following errors:

# oc logs -f manufacturing-server-6b59f66c65-kd5p8

 2023-12-01T07:17:30.803Z TRACE fdo_data_formats::publickey > Serializing certificate: [48, 130, 1, 86, 48, 129, 252, 160, 3, 2, 1, 2, 2, 8, 119, 19, 128, 182, 7, 89, 9, 17, 48, 10, 6, 8, 42, 134, 72, 206, 61, 4, 3, 2, 48, 49, 49, 13, 48, 11, 6, 3, 85, 4, 3, 12, 4, 68, 73, 85, 78, 49, 19, 48, 17, 6, 3, 85, 4, 10, 12, 10, 114, 101, 100, 104, 97, 116, 46, 99, 111, 109, 49, 11, 48, 9, 6, 3, 85, 4, 6, 19, 2, 85, 83, 48, 30, 23, 13, 50, 51, 49, 49, 50, 57, 48, 55, 49, 49, 49, 49, 90, 23, 13, 50, 52, 49, 49, 50, 56, 48, 55, 49, 49, 49, 49, 90, 48, 49, 49, 13, 48, 11, 6, 3, 85, 4, 3, 12, 4, 68, 73, 85, 78, 49, 19, 48, 17, 6, 3, 85, 4, 10, 12, 10, 114, 101, 100, 104, 97, 116, 46, 99, 111, 109, 49, 11, 48, 9, 6, 3, 85, 4, 6, 19, 2, 85, 83, 48, 89, 48, 19, 6, 7, 42, 134, 72, 206, 61, 2, 1, 6, 8, 42, 134, 72, 206, 61, 3, 1, 7, 3, 66, 0, 4, 145, 126, 23, 38, 174, 135, 28, 161, 11, 28, 98, 10, 206, 133, 106, 44, 167, 137, 29, 174, 166, 101, 237, 24, 141, 134, 194, 209, 227, 126, 132, 104, 132, 21, 86, 169, 179, 112, 102, 81, 24, 137, 102, 105, 45, 160, 182, 159, 178, 121, 150, 211, 147, 231, 38, 136, 98, 18, 128, 207, 138, 115, 101, 199, 48, 10, 6, 8, 42, 134, 72, 206, 61, 4, 3, 2, 3, 73, 0, 48, 70, 2, 33, 0, 151, 52, 188, 32, 28, 127, 204, 17, 8, 223, 230, 78, 24, 232, 90, 83, 12, 33, 65, 231, 97, 197, 229, 8, 84, 252, 51, 136, 245, 84, 219, 187, 2, 33, 0, 185, 79, 108, 85, 167, 237, 113, 9, 28, 191, 11, 58, 95, 111, 168, 189, 95, 94, 7, 30, 87, 130, 74, 60, 89, 254, 152, 14, 35, 134, 153, 219]
 2023-12-01T07:17:30.804Z INFO  fdo_manufacturing_server    > Listening on 0.0.0.0:8080
 2023-12-01T07:17:30.804Z INFO  fdo_manufacturing_server    > Scheduling maintenance every 60 seconds

 2023-12-01T07:40:18.892Z TRACE hyper::proto::h1::conn      > Conn::read_head
 2023-12-01T07:40:18.892Z TRACE hyper::proto::h1::conn      > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Busy }
 2023-12-01T07:40:18.892Z TRACE hyper::proto::h1::conn      > Conn::read_head
 2023-12-01T07:40:18.892Z TRACE hyper::proto::h1::io        > received 837 bytes
 2023-12-01T07:40:18.892Z TRACE tracing::span               > parse_headers;
 2023-12-01T07:40:18.892Z TRACE tracing::span::active       > -> parse_headers;
 2023-12-01T07:40:18.892Z TRACE hyper::proto::h1::role      > Request.parse bytes=837
 2023-12-01T07:40:18.892Z TRACE hyper::proto::h1::role      > Request.parse Complete(526)
 2023-12-01T07:40:18.892Z TRACE tracing::span::active       > <- parse_headers;
 2023-12-01T07:40:18.892Z TRACE tracing::span               > -- parse_headers;
 2023-12-01T07:40:18.892Z DEBUG hyper::proto::h1::io        > parsed 9 headers
 2023-12-01T07:40:18.892Z DEBUG hyper::proto::h1::conn      > incoming body is content-length (311 bytes)
 2023-12-01T07:40:18.892Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-01T07:40:18.892Z TRACE hyper::proto::h1::decode    > decode; state=Length(311)
 2023-12-01T07:40:18.892Z DEBUG hyper::proto::h1::conn      > incoming body completed
 2023-12-01T07:40:18.892Z TRACE warp::filters::method       > method::GET?: POST
 2023-12-01T07:40:18.892Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "ping"?: "fdo"
 2023-12-01T07:40:18.892Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "fdo"?: "fdo"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "101"?: "101"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "msg"?: "msg"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "10"?: "210"
 2023-12-01T07:40:18.892Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "fdo"?: "fdo"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "101"?: "101"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "msg"?: "msg"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "12"?: "210"
 2023-12-01T07:40:18.892Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "fdo"?: "fdo"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "101"?: "101"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "msg"?: "msg"
 2023-12-01T07:40:18.892Z TRACE warp::filters::path         > "210"?: "210"
 2023-12-01T07:40:18.892Z TRACE warp::filters::header       > header2("content-length")
 2023-12-01T07:40:18.892Z TRACE warp::filters::header       > exact?("Content-Type", "application/cbor")
 2023-12-01T07:40:18.892Z TRACE warp::filters::header       > optional("Authorization")
 2023-12-01T07:40:18.893Z TRACE fdo_http_wrapper::server    > Raw request: "8450c1b059d287ea28f0c064583414d3f0586745434448333834039896001830188118f91871184d1866182c18f618b71891188518d4184118c3189a05182518bd18f518d5183218501838188f1857185218881857182e182f18721896186818e6186e0e18db1894181f185c1828185f1831182c18b11857189d0118b1001830185c183c1869184f18711518f60a0e18cb18971857185c0e18260a18c7186a188e0c18fb18fa18a0188218c018641850184218ec1872184818be183e110d183818bb18f6184a18c51898183f18790a18da185018f1181a00183018b418bf18fe187a18a418aa1880187c183a1856182b182c187118a5184418f018e501184618af183d187b187415187918d6187618a618ee183c18a218ae18601868189618aa18270618fa1836182b1864182118d318df18601838185e"
 2023-12-01T07:40:18.896Z TRACE fdo_data_formats::types     > Using fully interoperable KDF
 2023-12-01T07:40:18.896Z DEBUG fdo_manufacturing_server::handlers::diun > Got new keys, setting EncryptionKeys { cipher_suite: Some(A256Gcm), keys: Some([[ DERIVEDKEYS: REDACTED ]]) }
 2023-12-01T07:40:18.896Z TRACE fdo_store::directory                     > Attempting to store data to /etc/fdo/sessions/lBUD7tI0OLA9YEfOMFUNKHVmBM6BulpJNBkFo7fTsos= (temporary at /etc/fdo/sessions/.lBUD7tI0OLA9YEfOMFUNKHVmBM6BulpJNBkFo7fTsos=.tmp)
 2023-12-01T07:40:18.896Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "fdo"?: "fdo"
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "101"?: "101"
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "msg"?: "msg"
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "212"?: "210"
 2023-12-01T07:40:18.896Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "fdo"?: "fdo"
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "101"?: "101"
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "msg"?: "msg"
 2023-12-01T07:40:18.896Z TRACE warp::filters::path                      > "214"?: "210"
 2023-12-01T07:40:18.896Z WARN  fdo_http_wrapper::server                 > Error processing request: Rejection([Error(ErrorMessage { error_code: InternalServerError, previous_message_type: DIUNConnect, error_string: "Error storing session", error_timestamp: None, error_uuid: 7803449700289115045 }), MethodNotAllowed])
 2023-12-01T07:40:18.896Z INFO  manufacturing-server                     > 10.128.0.79:41972 "POST /fdo/101/msg/210 HTTP/1.1" 500 "-" "-" 3.861408ms
 2023-12-01T07:40:18.896Z TRACE tracing::span                            > encode_headers;
 2023-12-01T07:40:18.896Z TRACE tracing::span::active                    > -> encode_headers;
 2023-12-01T07:40:18.896Z TRACE hyper::proto::h1::role                   > Server::encode status=500, body=Some(Known(111)), req_method=Some(POST)
 2023-12-01T07:40:18.896Z TRACE tracing::span::active                    > <- encode_headers;
 2023-12-01T07:40:18.896Z TRACE tracing::span                            > -- encode_headers;
 2023-12-01T07:40:18.896Z TRACE hyper::proto::h1::io                     > buffer.queue self.len=115 buf.len=111
 2023-12-01T07:40:18.896Z DEBUG hyper::proto::h1::io                     > flushed 226 bytes
 2023-12-01T07:40:18.896Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
 2023-12-01T07:40:18.896Z TRACE hyper::proto::h1::conn                   > Conn::read_head
 2023-12-01T07:40:18.896Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::conn                   > Conn::read_head
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::io                     > received 0 bytes
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::io                     > parse eof
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::conn                   > State::close_read()
 2023-12-01T07:40:33.899Z DEBUG hyper::proto::h1::conn                   > read eof
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::conn                   > State::close_write()
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::conn                   > State::close_read()
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::conn                   > State::close_write()
 2023-12-01T07:40:33.899Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Closed, writing: Closed, keep_alive: Disabled }
 2023-12-01T07:40:33.899Z DEBUG hyper::proto::h1::conn                   > error shutting down IO: Transport endpoint is not connected (os error 107)
 2023-12-01T07:40:33.899Z DEBUG hyper::server::server::new_svc           > connection error: error shutting down connection: Transport endpoint is not connected (os error 107)

I need assistance in resolving this session storage error and ensuring the proper retrieval of device credentials during the device boot process.

env:

BTW, I tested both the default FDO images (quay.io/vemporop/fdo-xxx-server:1.0) and built the latest FDO server images (v0.4.12). Unfortunately, the same issue persists in both cases.

7flying commented 11 months ago

Hi @morvencao first of all, when the device boots no device credentials are usually expected to be found, that's why we need that first exchange between the manufacturing client and manufacturing server, so that the device can get its device credentials via Device Onboarding.

The session storage error is indeed strange, can you copy-paste the manufacturing server config file? Are there any selinux denials on the manufacturing server?

empovit commented 11 months ago

@morvencao which FDO client version are you using, if you can find out? Or which RHEL version? Could you please try with the default FDO server images?

@7flying this is exactly the problem I saw when trying to run clients up to RHEL 9.3 with 0.4.x servers, and that's why I though they were incompatible and/or operator would require changes to support 0.4.x.

7flying commented 11 months ago

@empovit right, then on the fdo-operator side the docs should clarify that there must be a pairing between server versions and client versions.

7flying commented 11 months ago

I'm transferring this issue to the FDO operators repo.

empovit commented 11 months ago

@7flying unfortunately this doesn't solve the problem because:

  1. Customers use the client version that comes with their RHEL for Edge image and don't have control over it. And we see that all current GA versions doesn't seem compatible with 0.4.x servers.
  2. FDO operator doesn't support 0.4.x as apparently the backward compatibility was broken at some point.
empovit commented 11 months ago

@morvencao the bottom line is that the FDO operator doesn't currently support 0.4.x FDO servers. You can either use the default images (they work, verified today), or wait until the FDO operator has been modified to solve this issue.

Alternatively, deploy FDO servers using a different method.

7flying commented 11 months ago

@7flying unfortunately this doesn't solve the problem because:

  1. Customers use the client version that comes with their RHEL for Edge image and don't have control over it. And we see that all current GA versions doesn't seem compatible with 0.4.x servers.
  2. FDO operator doesn't support 0.4.x as apparently the backward compatibility was broken at some point.

All current GA versions are not compatible with 0.4.x servers indeed, what we support now are X.Y.Z clients with X.Y.Z servers, and we are not backwards compatible either since the tech is still on preview.

What I suggest is to make sure that the fdo operator handling the servers matches the specific version that the clients have.

morvencao commented 11 months ago

@empovit @7flying Thank you for responding. Initially, I used the default image for FDO servers (quay.io/vemporop/fdo-xxx-server:1.0). However, I encountered the same issue. Upon further investigation, I discovered that the images were built a year ago, which seemed outdated. Consequently, I built the images (version 0.4.12) from the repository https://github.com/fdo-rs/fido-device-onboard-rs and updated the image in the FDO operator CRs. Unfortunately, the issue persists.

I need to update the issue description to clarify that this should not be a version-related problem, this will prevent others facing the same issue from attempting unnecessary repeated attempts.

The manufacturing server should have no SELinux denials. If any denials occur, we should be able to identify them through the pod state or logs.

And this is the configuration file for the manufacturing server:

# oc exec -it deploy/manufacturing-server -- cat /etc/fdo/manufacturing-server.conf.d/manufacturing-server.yml
session_store_driver:
  Directory:
    path: /etc/fdo/sessions/
ownership_voucher_store_driver:
  Directory:
    path: /etc/fdo/ownership_vouchers/
public_key_store_driver:
  Directory:
    path: /etc/fdo/keys/
bind: 0.0.0.0:8080
rendezvous_info:
- dns: rendezvous-server-fdo.apps.test.xxxx.com
  device_port: 80
  owner_port: 80
  protocol: http
protocols:
  plain_di: false
  diun:
    key_path: /etc/fdo/keys/diun_key.der
    cert_path: /etc/fdo/keys/diun_cert.pem
    key_type: SECP256R1
    mfg_string_type: SerialNumber
    allowed_key_storage_types:
    - FileSystem
    - Tpm
manufacturing:
  manufacturer_cert_path: /etc/fdo/keys/manufacturer_cert.pem
  manufacturer_private_key: /etc/fdo/keys/manufacturer_key.der
  owner_cert_path: /etc/fdo/keys/owner_cert.pem
  device_cert_ca_private_key: /etc/fdo/keys/device_ca_key.der
  device_cert_ca_chain: /etc/fdo/keys/device_ca_cert.pem
morvencao commented 11 months ago

FDO client version is 0.4.12 RHEL version is 9.2.

# /usr/libexec/fdo/fdo-client-linuxapp --version
fdo-client-linuxapp 0.4.12
# # cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"
...

And the fdo client service status:

# systemctl status fdo-client-linuxapp.service
○ fdo-client-linuxapp.service - FDO client
     Loaded: loaded (/usr/lib/systemd/system/fdo-client-linuxapp.service; enabled; preset: disabled)
     Active: inactive (dead) since Fri 2023-12-01 10:43:53 EST; 2min 34s ago
    Process: 1192 ExecStartPre=/usr/sbin/restorecon /boot/device-credentials (code=exited, status=255/EXCEPTION)
    Process: 1197 ExecStart=/usr/libexec/fdo/fdo-client-linuxapp (code=exited, status=0/SUCCESS)
    Process: 1203 ExecStartPost=/usr/bin/mv /boot/device-credentials /etc/device-credentials (code=exited, status=1/FAILURE)
   Main PID: 1197 (code=exited, status=0/SUCCESS)
        CPU: 9ms

Dec 01 10:43:53 microshift-node systemd[1]: Starting FDO client...
Dec 01 10:43:53 microshift-node restorecon[1192]: /usr/sbin/restorecon: lstat(/boot/device-credentials) failed: No such file or directory
Dec 01 10:43:53 microshift-node fdo-client-linuxapp[1197]:  2023-12-01T15:43:53.851Z INFO  fdo_client_linuxapp > No usable device credential located, skipping Device Onboarding
Dec 01 10:43:53 microshift-node mv[1203]: /usr/bin/mv: cannot stat '/boot/device-credentials': No such file or directory
Dec 01 10:43:53 microshift-node systemd[1]: fdo-client-linuxapp.service: Deactivated successfully.
Dec 01 10:43:53 microshift-node systemd[1]: Finished FDO client.
morvencao commented 11 months ago

@empovit @7flying I tested the default images (quay.io/vemporop/fdo-xxx-server:1.0), and I remember encountering the same issue. This might be attributed to a compatibility issue between the fdo-client version (0.4.12) and the default images.

Now, in the current scenario, we would expect the fdo-client version (0.4.12) to be compatible with the 0.4.12 fdo-server images. However, the issue persists. Could this be due to oversight on my part? Or is it possible that using the same fdo-operator to deploy new images with old configurations is causing the problem?

empovit commented 11 months ago

Thanks for the update @morvencao! This is odd because I've been running RHEL 9.2 and RHEL 9.3 FDO clients against the operator without issues (old server images).

Running 0.4.12 with the operator isn't as simple as replacing the image, as the operator code must be changed to produce a compatible configuration, and map the right volumes.

The only thing that comes to mind at the moment is a missing or read-only volume mount. E.g. making sure that the ownership voucher persistent volume is writable.

Regarding the client error, you cannot run the onboarding client (/usr/libexec/fdo/fdo-client-linuxapp) without first making sure that the initialization/manufacturing client (/usr/libexec/fdo/fdo-manufacturing-client) has been successful. It's the initialization client that produces device credentials (/boot/device-credentials) on the machine after talking to a manufacturing server.

The sequence is:

  1. Boot a machine into an FDO-enabled image with the FDO initialization client inside it pointing to a manufacturing server.
  2. Let the initialization complete - device credentials are written to the machine, an ownership voucher is produced on the manufacturing server, the OS is installed on the machine.
  3. Shut down the machine and move it to its destination.
  4. Boot the machine at the destination - the onboarding client will kick in and try to connect to the rendezvous server specified in the device credentials file.
  5. Copy the ownership voucher over to the onboarding server. The server will register itself with the rendezvous server and claim (and prove) ownership of the device by presenting the ownership voucher. There must be a match between the device credentials and the ownership voucher.
empovit commented 11 months ago

@morvencao BTW, in what namespace are you creating your server CRs? Please make sure it's not default.

morvencao commented 11 months ago

Thank you, @empovit , for providing a comprehensive explanation of the entire workflow.

BTW, in what namespace are you creating your server CRs? Please make sure it's not default.

Just to clarify, the namespace is fdo, not default.

morvencao commented 11 months ago

When reverting to the default images, the device credential still remains ungenerated.

$ oc get pod -o yaml | grep image:
    - image: quay.io/vemporop/fdo-manufacturing-server:1.0
      image: quay.io/vemporop/fdo-manufacturing-server:1.0
    - image: quay.io/vemporop/fdo-owner-onboarding-server:1.0
    - image: quay.io/vemporop/fdo-serviceinfo-api-server:1.0
      image: quay.io/vemporop/fdo-owner-onboarding-server:1.0
      image: quay.io/vemporop/fdo-serviceinfo-api-server:1.0
    - image: quay.io/vemporop/fdo-rendezvous-server:1.0
      image: quay.io/vemporop/fdo-rendezvous-server:1.0

These are the fdo-client logs from journalctl:

$ journalctl | grep "Starting FDO client" -A 20
Dec 01 21:10:50 microshift-node systemd[1]: Starting FDO client...
Dec 01 21:10:50 microshift-node systemd[1]: Starting greenboot Health Checks Runner...
Dec 01 21:10:50 microshift-node greenboot[1339]: Running Required Health Check Scripts...
Dec 01 21:10:50 microshift-node restorecon[1336]: /usr/sbin/restorecon: lstat(/boot/device-credentials) failed: No such file or directory
Dec 01 21:10:50 microshift-node fdo-client-linuxapp[1345]:  2023-12-02T02:10:50.391Z INFO  fdo_client_linuxapp > No usable device credential located, skipping Device Onboarding
Dec 01 21:10:50 microshift-node 00_required_scripts_start.sh[1348]: Running greenboot Required Health Check Scripts
Dec 01 21:10:50 microshift-node mv[1353]: /usr/bin/mv: cannot stat '/boot/device-credentials': No such file or directory
Dec 01 21:10:50 microshift-node systemd[1]: fdo-client-linuxapp.service: Deactivated successfully.
Dec 01 21:10:50 microshift-node systemd[1]: Finished FDO client.
Dec 01 21:10:50 microshift-node greenboot[1339]: Script '00_required_scripts_start.sh' SUCCESS
Dec 01 21:10:50 microshift-node 01_repository_dns_check.sh[1358]: /etc/ostree/remotes.d is empty, skipping check
Dec 01 21:10:50 microshift-node greenboot[1339]: Script '01_repository_dns_check.sh' SUCCESS
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Device:        /dev/watchdog0
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Identity:      iTCO_wdt [version 0]
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Timeout:       30 seconds
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Pre-timeout:    0 seconds
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Timeleft:      29 seconds
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: FLAG           DESCRIPTION               STATUS BOOT-STATUS
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: KEEPALIVEPING  Keep alive ping reply          1           0
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: MAGICCLOSE     Supports magic close char      0           0
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: SETTIMEOUT     Set timeout (in seconds)       0           0

These are the logs for the manufacturing-server:

$ oc logs deploy/manufacturing-server
 2023-12-02T02:00:56.028Z TRACE fdo_data_formats::publickey > Serializing certificate: [48, 130, 1, 86, 48, 129, 252, 160, 3, 2, 1, 2, 2, 8, 119, 19, 128, 182, 7, 89, 9, 17, 48, 10, 6, 8, 42, 134, 72, 206, 61, 4, 3, 2, 48, 49, 49, 13, 48, 11, 6, 3, 85, 4, 3, 12, 4, 68, 73, 85, 78, 49, 19, 48, 17, 6, 3, 85, 4, 10, 12, 10, 114, 101, 100, 104, 97, 116, 46, 99, 111, 109, 49, 11, 48, 9, 6, 3, 85, 4, 6, 19, 2, 85, 83, 48, 30, 23, 13, 50, 51, 49, 49, 50, 57, 48, 55, 49, 49, 49, 49, 90, 23, 13, 50, 52, 49, 49, 50, 56, 48, 55, 49, 49, 49, 49, 90, 48, 49, 49, 13, 48, 11, 6, 3, 85, 4, 3, 12, 4, 68, 73, 85, 78, 49, 19, 48, 17, 6, 3, 85, 4, 10, 12, 10, 114, 101, 100, 104, 97, 116, 46, 99, 111, 109, 49, 11, 48, 9, 6, 3, 85, 4, 6, 19, 2, 85, 83, 48, 89, 48, 19, 6, 7, 42, 134, 72, 206, 61, 2, 1, 6, 8, 42, 134, 72, 206, 61, 3, 1, 7, 3, 66, 0, 4, 145, 126, 23, 38, 174, 135, 28, 161, 11, 28, 98, 10, 206, 133, 106, 44, 167, 137, 29, 174, 166, 101, 237, 24, 141, 134, 194, 209, 227, 126, 132, 104, 132, 21, 86, 169, 179, 112, 102, 81, 24, 137, 102, 105, 45, 160, 182, 159, 178, 121, 150, 211, 147, 231, 38, 136, 98, 18, 128, 207, 138, 115, 101, 199, 48, 10, 6, 8, 42, 134, 72, 206, 61, 4, 3, 2, 3, 73, 0, 48, 70, 2, 33, 0, 151, 52, 188, 32, 28, 127, 204, 17, 8, 223, 230, 78, 24, 232, 90, 83, 12, 33, 65, 231, 97, 197, 229, 8, 84, 252, 51, 136, 245, 84, 219, 187, 2, 33, 0, 185, 79, 108, 85, 167, 237, 113, 9, 28, 191, 11, 58, 95, 111, 168, 189, 95, 94, 7, 30, 87, 130, 74, 60, 89, 254, 152, 14, 35, 134, 153, 219]
 2023-12-02T02:00:56.028Z INFO  fdo_manufacturing_server    > Listening on 0.0.0.0:8080
 2023-12-02T02:00:56.028Z INFO  fdo_manufacturing_server    > Scheduling maintenance every 60 seconds
 2023-12-02T02:00:56.028Z TRACE mio::poll                   > registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
 2023-12-02T02:06:58.942Z TRACE mio::poll                   > registering event source with poller: token=Token(2), interests=READABLE | WRITABLE
 2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::conn      > Conn::read_head
 2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::conn      > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Busy }
 2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::conn      > Conn::read_head
 2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::io        > received 836 bytes
 2023-12-02T02:06:58.942Z TRACE tracing::span               > parse_headers
 2023-12-02T02:06:58.942Z TRACE tracing::span::active       > -> parse_headers
 2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::role      > Request.parse bytes=836
 2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::role      > Request.parse Complete(526)
 2023-12-02T02:06:58.942Z TRACE tracing::span::active       > <- parse_headers
 2023-12-02T02:06:58.942Z TRACE tracing::span               > -- parse_headers
 2023-12-02T02:06:58.942Z DEBUG hyper::proto::h1::io        > parsed 9 headers
 2023-12-02T02:06:58.942Z DEBUG hyper::proto::h1::conn      > incoming body is content-length (310 bytes)
 2023-12-02T02:06:58.942Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::decode    > decode; state=Length(310)
 2023-12-02T02:06:58.942Z DEBUG hyper::proto::h1::conn      > incoming body completed
 2023-12-02T02:06:58.942Z TRACE warp::filters::method       > method::GET?: POST
 2023-12-02T02:06:58.942Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "ping"?: "fdo"
 2023-12-02T02:06:58.942Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "fdo"?: "fdo"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "101"?: "101"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "msg"?: "msg"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "10"?: "210"
 2023-12-02T02:06:58.942Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "fdo"?: "fdo"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "101"?: "101"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "msg"?: "msg"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "12"?: "210"
 2023-12-02T02:06:58.942Z TRACE warp::filters::method       > method::POST?: POST
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "fdo"?: "fdo"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "101"?: "101"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "msg"?: "msg"
 2023-12-02T02:06:58.942Z TRACE warp::filters::path         > "210"?: "210"
 2023-12-02T02:06:58.942Z TRACE warp::filters::header       > header2("content-length")
 2023-12-02T02:06:58.942Z TRACE warp::filters::header       > exact?("Content-Type", "application/cbor")
 2023-12-02T02:06:58.942Z TRACE warp::filters::header       > optional("Authorization")
 2023-12-02T02:06:58.943Z TRACE fdo_http_wrapper::server    > Raw request: "8450ca6a8896835d5f3a9f51dbff12e6f6ba67454344483338340398960018300d0a18bf186a001868184f184f18e318e61899182105181f183b188e184118db120f18a118fd187f18f018551879183018e4186318e9187d185f18ce18f8189918b318a618e8181a18f618fc1829185b18231889185d18be130018301849182f13184f184518500a182f185114187c188e18d6189418411891185718f51858187e183918f618ea18c918ca1886186918c718a7187718c6185015182718b518b0181b182518d418551863189a18bf1849185d1834181c15001830182a0e188f185418661881189918ea18d118cf18da18b3183618f7188d0518c3181e18fc185718fa189a18c318f7182c189b10186318b218c1186e1819188a183b185f185c1875187818f101183e18c41846187e18b618ad184a1877"
 2023-12-02T02:06:58.946Z TRACE fdo_data_formats::types     > Using fully interoperable KDF
 2023-12-02T02:06:58.946Z DEBUG fdo_manufacturing_server::handlers::diun > Got new keys, setting EncryptionKeys { cipher_suite: Some(A256Gcm), keys: Some([[ DERIVEDKEYS: REDACTED ]]) }
 2023-12-02T02:06:58.946Z TRACE fdo_store::directory                     > Attempting to store data to /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U= (temporary at /etc/fdo/sessions/.osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=.tmp)
 2023-12-02T02:06:58.946Z TRACE fdo_store::directory                     > Attempting to load data from /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=
 2023-12-02T02:06:58.946Z TRACE fdo_http_wrapper::server                 > Raw response: "d28457a2012619010050ca6a8896835d5f3a9f51dbff12e6f6baa1190101830a0259015e8159015a308201563081fca0030201020208771380b607590911300a06082a8648ce3d0403023031310d300b06035504030c044449554e31133011060355040a0c0a7265646861742e636f6d310b3009060355040613025553301e170d3233313132393037313131315a170d3234313132383037313131315a3031310d300b06035504030c044449554e31133011060355040a0c0a7265646861742e636f6d310b30090603550406130255533059301306072a8648ce3d020106082a8648ce3d03010703420004917e1726ae871ca10b1c620ace856a2ca7891daea665ed188d86c2d1e37e8468841556a9b3706651188966692da0b69fb27996d393e72688621280cf8a7365c7300a06082a8648ce3d04030203490030460221009734bc201c7fcc1108dfe64e18e85a530c2141e761c5e50854fc3388f554dbbb022100b94f6c55a7ed71091cbf0b3a5f6fa8bd5f5e071e57824a3c59fe980e238699db59011e819896001830183118ad18671876186e188818b1182a18e718ad07182118a5181f187e186418551841181b188718b218bf1899187818de1850183618ad185e183918e318ae185c021836185c181c18d6183c1833171820182d0d18ea18e01855187800183018f918291855185d18d91418ec187c189e187118f6185e1873189d182918cc16183b181d188818cc18b5187918261884080a18bd18b8187518361318a118e7189205189b18d218de18d31824187018e718f918f2182d1898187f00183010183218e318fe1841188b1318ce0a188f18f318b918de18de18d218e0185a18ba18ea1848181f189218a618271859186b188918c813185818e9185f183118f018c318c21845187b1832183a18ea18ff188f18f218e518d8187a188158407104b83005289992ed192cb5ab44c37c3f805f4759729f91fe8169d7af55a5c8dc82f893dfc8cc6249f19eb23dec97413401be5a48bb2689eb245fea8b08189f"
 2023-12-02T02:06:58.946Z INFO  manufacturing-server                     > 10.128.0.76:45630 "POST /fdo/101/msg/210 HTTP/1.1" 200 "-" "-" 3.869842ms
 2023-12-02T02:06:58.946Z TRACE tracing::span                            > encode_headers
 2023-12-02T02:06:58.946Z TRACE tracing::span::active                    > -> encode_headers
 2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::role                   > Server::encode status=200, body=Some(Known(741)), req_method=Some(POST)
 2023-12-02T02:06:58.946Z TRACE tracing::span::active                    > <- encode_headers
 2023-12-02T02:06:58.946Z TRACE tracing::span                            > -- encode_headers
 2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::io                     > buffer.queue self.len=208 buf.len=741
 2023-12-02T02:06:58.946Z DEBUG hyper::proto::h1::io                     > flushed 949 bytes
 2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
 2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::conn                   > Conn::read_head
 2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn                   > Conn::read_head
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::io                     > received 638 bytes
 2023-12-02T02:06:59.200Z TRACE tracing::span                            > parse_headers
 2023-12-02T02:06:59.200Z TRACE tracing::span::active                    > -> parse_headers
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::role                   > Request.parse bytes=638
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::role                   > Request.parse Complete(636)
 2023-12-02T02:06:59.200Z TRACE tracing::span::active                    > <- parse_headers
 2023-12-02T02:06:59.200Z TRACE tracing::span                            > -- parse_headers
 2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::io                     > parsed 10 headers
 2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::conn                   > incoming body is content-length (2 bytes)
 2023-12-02T02:06:59.200Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::decode                 > decode; state=Length(2)
 2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::conn                   > incoming body completed
 2023-12-02T02:06:59.200Z TRACE warp::filters::method                    > method::GET?: POST
 2023-12-02T02:06:59.200Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "ping"?: "fdo"
 2023-12-02T02:06:59.200Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "fdo"?: "fdo"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "101"?: "101"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "msg"?: "msg"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "10"?: "212"
 2023-12-02T02:06:59.200Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "fdo"?: "fdo"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "101"?: "101"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "msg"?: "msg"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "12"?: "212"
 2023-12-02T02:06:59.200Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "fdo"?: "fdo"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "101"?: "101"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "msg"?: "msg"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "210"?: "212"
 2023-12-02T02:06:59.200Z TRACE warp::filters::method                    > method::POST?: POST
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "fdo"?: "fdo"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "101"?: "101"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "msg"?: "msg"
 2023-12-02T02:06:59.200Z TRACE warp::filters::path                      > "212"?: "212"
 2023-12-02T02:06:59.200Z TRACE warp::filters::header                    > header2("content-length")
 2023-12-02T02:06:59.200Z TRACE warp::filters::header                    > exact?("Content-Type", "application/cbor")
 2023-12-02T02:06:59.200Z TRACE warp::filters::header                    > optional("Authorization")
 2023-12-02T02:06:59.200Z TRACE fdo_store::directory                     > Attempting to load data from /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=
 2023-12-02T02:06:59.200Z TRACE fdo_http_wrapper::server                 > Raw request: "81f6"
 2023-12-02T02:06:59.200Z TRACE fdo_store::directory                     > Attempting to store data to /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U= (temporary at /etc/fdo/sessions/.osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=.tmp)
 2023-12-02T02:06:59.200Z TRACE fdo_store::directory                     > Attempting to load data from /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=
 2023-12-02T02:06:59.200Z TRACE fdo_http_wrapper::server                 > Raw response: "d08343a10103a1054ce25f4cce0467fe8a8caea555559f26fd5abeedc9d19875aff68e38b2b4fe3e89aafb"
 2023-12-02T02:06:59.200Z INFO  manufacturing-server                     > 10.128.0.76:45630 "POST /fdo/101/msg/212 HTTP/1.1" 200 "-" "-" 376.605µs
 2023-12-02T02:06:59.200Z TRACE tracing::span                            > encode_headers
 2023-12-02T02:06:59.200Z TRACE tracing::span::active                    > -> encode_headers
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::role                   > Server::encode status=200, body=Some(Known(43)), req_method=Some(POST)
 2023-12-02T02:06:59.200Z TRACE tracing::span::active                    > <- encode_headers
 2023-12-02T02:06:59.200Z TRACE tracing::span                            > -- encode_headers
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::io                     > buffer.queue self.len=95 buf.len=43
 2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::io                     > flushed 138 bytes
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn                   > Conn::read_head
 2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn                   > Conn::read_head
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::io                     > received 0 bytes
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::io                     > parse eof
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn                   > State::close_read()
 2023-12-02T02:07:13.950Z DEBUG hyper::proto::h1::conn                   > read eof
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn                   > State::close_write()
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn                   > State::close_read()
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn                   > State::close_write()
 2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn                   > flushed({role=server}): State { reading: Closed, writing: Closed, keep_alive: Disabled }
 2023-12-02T02:07:13.950Z DEBUG hyper::proto::h1::conn                   > error shutting down IO: Transport endpoint is not connected (os error 107)
 2023-12-02T02:07:13.950Z DEBUG hyper::server::conn::spawn_all           > connection error: error shutting down connection: Transport endpoint is not connected (os error 107)
 2023-12-02T02:07:13.950Z TRACE mio::poll                                > deregistering event source from poller
 2023-12-02T02:17:56.049Z TRACE fdo_store::directory                     > File at /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U= has expired, attempting removal

What's weird is that the logs indicate the generation of a new session, and there is a response sent back to the fdo-client. However, the session is later removed due to expiration. I'm uncertain if this behavior is related to the following line:

 2023-12-02T02:07:13.950Z DEBUG hyper::proto::h1::conn                   > error shutting down IO: Transport endpoint is not connected (os error 107)
 2023-12-02T02:07:13.950Z DEBUG hyper::server::conn::spawn_all           > connection error: error shutting down connection: Transport endpoint is not connected (os error 107)

I need to clarify that I'm utilizing the libvirt platform to bootstrap a new VM from an HTTP Boot server, simulating a physical device. I observed that the VM shutoff after the OS installer completes its tasks. Before the shutoff, I notice the manufacturing client initiating, which should correspond to the request handling and session setup in manufacturing logs. However, immediately after that, the VM shuts down, therefore the manufacturing client begins the session without obtaining the device credential. There is disconnection between the manufacturing client and the manufacturing server because of the VM shutoff, could this be the root cause?

image

empovit commented 11 months ago

FYI I'm looking into an issue where a simplified RHEL for Edge installer image with RHEL 9.2 contacts a manufacturer server also installed on RHEL 9.2, but doesn't finish initialization without blocking the installation. The handshake doesn't produce either device credentials or an ownership voucher. This in turn leads to onboarding not being performed. The server isn't being run by the FDO operator - it's regular dnf install from the default repo.

morvencao commented 11 months ago

@empovit The issue you're investigating closely resembles one I've faced, execpt that the FDO servers in my env are deployed with fdo-operator. I'm utilizing a simplified RHEL for Edge installer (RHEL 9.2) image to initiate a VM. It appears that the FDO hasn't completed initialization, though the installation isn't blocked.

I'm actively investigating this issue as well and endeavoring to gather additional information to assist you further.

morvencao commented 11 months ago

Update:

I installed FDO servers version v0.4.12 using the fdo-aio RPM package from the default repository. The FDO client is also version 0.4.12. Subsequently, I discovered that device credentials can be generated and located within the device.

fdo client and device credential:

bash-5.1# /usr/libexec/fdo/fdo-client-linuxapp --version
fdo-client-linuxapp 0.4.12
bash-5.1# fdo-owner-tool dump-device-credential /boot/device-credentials
Active: true
Protocol Version: 101
Device Info:
Device GUID: xxx
...

fdo servers:

# /usr/libexec/fdo/fdo-manufacturing-server --version
fdo-manufacturing-server 0.4.12
# systemctl status fdo-aio
● fdo-aio.service - FDO service info API server
     Loaded: loaded (/usr/lib/systemd/system/fdo-aio.service; enabled; preset: disabled)
     Active: active (running) since Mon 2023-12-04 02:26:59 EST; 1h 54min ago
   Main PID: 121500 (fdo-admin-tool)
      Tasks: 565 (limit: 405667)
     Memory: 102.9M
        CPU: 1.171s
     CGroup: /system.slice/fdo-aio.service
             ├─121500 /usr/bin/fdo-admin-tool aio --directory /etc/fdo/aio --binary-path /usr/libexec/fdo
             ├─121613 /usr/libexec/fdo/fdo-manufacturing-server
             ├─121614 /usr/libexec/fdo/fdo-owner-onboarding-server
             ├─121615 /usr/libexec/fdo/fdo-rendezvous-server
             └─121616 /usr/libexec/fdo/fdo-serviceinfo-api-server

Dec 04 02:26:59 dell-per750-01.mcoasis.com systemd[1]: Started FDO service info API server.
Dec 04 02:26:59 dell-per750-01.mcoasis.com fdo-admin-tool[121500]:  INFO  fdo_admin_tool::aio::execute > Starting AIO
Dec 04 02:26:59 dell-per750-01.mcoasis.com fdo-admin-tool[121500]:  INFO  fdo_admin_tool::aio::execute > Waiting until services are ready
Dec 04 02:27:00 dell-per750-01.mcoasis.com fdo-admin-tool[121500]:  INFO  fdo_admin_tool::aio::execute > All services are ready
Dec 04 02:27:00 dell-per750-01.mcoasis.com fdo-admin-tool[121500]:  INFO  fdo_admin_tool::aio::execute > AIO running
# /usr/libexec/fdo/fdo-manufacturing-server --version
fdo-manufacturing-server 0.4.12

Does this imply that the fdo-client 0.4.12 is incompatible with fdo-servers (version 1.0) deployed by the fdo-operator? If so, How to update the fdo-servers using the fdo-operator, to enable the integration with the fdo-client 0.4.12?

7flying commented 11 months ago

@empovit another suggestion from my side to spot version incompatibilities more easily would be to match the name of the version of the fdo-operators to the version of the fdo servers, aka not 1.0, but 0.4.12, for instance. I don't know if the change would be possible at this point though.

empovit commented 10 months ago

An old version of the FDO manufacturing client had issues that would prevent device initialization from finishing successfully when using TPM. If you added coreos.inst.skip_reboot kernel argument to break into the dracut shell before reboot, you could see something similar to the following lines in the journal:

:/# journalctl | grep manufact
Dec 04 14:48:59 localhost kernel: Command line: BOOT_IMAGE=/images/pxeboot/vmlinuz rd.neednet=1 coreos.inst.crypt_root=1 coreos.inst.isoroot=RHEL-9-2-0-BaseOS-x86_64 coreos.inst.install_dev=/dev/vda coreos.inst.image_file=/run/media/iso/image.raw.xz coreos.inst.insecure fdo.manufacturing_server_url=http://10.16.211.77:8080 fdo.diun_pub_key_insecure=true quiet coreos.inst.skip_reboot
Dec 04 14:48:59 localhost kernel: Kernel command line: BOOT_IMAGE=/images/pxeboot/vmlinuz rd.neednet=1 coreos.inst.crypt_root=1 coreos.inst.isoroot=RHEL-9–2–0–BaseOS-x86_64 coreos.inst.install_dev=/dev/vda coreos.inst.image_file=/run/media/iso/image.raw.xz coreos.inst.insecure fdo.manufacturing_server_url=http://10.16.211.77:8080 fdo.diun_pub_key_insecure=true quiet coreos.inst.skip_reboot
Dec 04 14:48:59 localhost kernel: random: crng init done (trusting CPU's manufacturer)
Dec 04 14:48:59 localhost dracut-cmdline[371]: insecure fdo.manufacturing_server_url=http://10.16.211.77:8080 fdo.diun_pub_key_insecure=true quiet coreos.inst.skip_reboot
Dec 04 14:50:44 localhost.localdomain manufacturing-client-service[740]: 2023-12-04T14:50:44.0032 INFO fdo_manufacturing_client > No usable device credential located, performing Device Onboarding
Dec 04 14:50:44 localhost.localdomain manufacturing-client-service[740]: 2023-12-04T14:50:44.0062 INFO fdo_manufacturing_client > Performing DIUN
Dec 04 14:50:44 localhost.localdomain systemd[1]: manufacturing-client.service: Main process exited, code=killed, status=15/TERM
Dec 04 14:50:44 localhost.localdomain systemd[1]: manufacturing-client.service: Failed with result 'signal'.
Dec 04 14:50:44 localhost.localdomain systemd[1]: manufacturing-client.service: Triggering OnFailure= dependencies.

You likely hit that issue with your build of RHEL 9.2. It has been fixed in later builds, and shouldn't be seen in RHEL 9.3.

I've also merged #124 that clarifies the use of compatible server and client versions, and makes the operator use the images built by the FDO team by default, instead of custom-built images.

With this, I'm closing this issue.