Closed morvencao closed 10 months ago
Hi @morvencao first of all, when the device boots no device credentials are usually expected to be found, that's why we need that first exchange between the manufacturing client and manufacturing server, so that the device can get its device credentials via Device Onboarding.
The session storage error is indeed strange, can you copy-paste the manufacturing server config file? Are there any selinux denials on the manufacturing server?
@morvencao which FDO client version are you using, if you can find out? Or which RHEL version? Could you please try with the default FDO server images?
@7flying this is exactly the problem I saw when trying to run clients up to RHEL 9.3 with 0.4.x servers, and that's why I though they were incompatible and/or operator would require changes to support 0.4.x.
@empovit right, then on the fdo-operator side the docs should clarify that there must be a pairing between server versions and client versions.
I'm transferring this issue to the FDO operators repo.
@7flying unfortunately this doesn't solve the problem because:
@morvencao the bottom line is that the FDO operator doesn't currently support 0.4.x FDO servers. You can either use the default images (they work, verified today), or wait until the FDO operator has been modified to solve this issue.
Alternatively, deploy FDO servers using a different method.
@7flying unfortunately this doesn't solve the problem because:
- Customers use the client version that comes with their RHEL for Edge image and don't have control over it. And we see that all current GA versions doesn't seem compatible with 0.4.x servers.
- FDO operator doesn't support 0.4.x as apparently the backward compatibility was broken at some point.
All current GA versions are not compatible with 0.4.x servers indeed, what we support now are X.Y.Z clients with X.Y.Z servers, and we are not backwards compatible either since the tech is still on preview.
What I suggest is to make sure that the fdo operator handling the servers matches the specific version that the clients have.
@empovit @7flying
Thank you for responding. Initially, I used the default image for FDO servers (quay.io/vemporop/fdo-xxx-server:1.0
). However, I encountered the same issue. Upon further investigation, I discovered that the images were built a year ago, which seemed outdated. Consequently, I built the images (version 0.4.12) from the repository https://github.com/fdo-rs/fido-device-onboard-rs and updated the image in the FDO operator CRs. Unfortunately, the issue persists.
I need to update the issue description to clarify that this should not be a version-related problem, this will prevent others facing the same issue from attempting unnecessary repeated attempts.
The manufacturing server should have no SELinux denials. If any denials occur, we should be able to identify them through the pod state or logs.
And this is the configuration file for the manufacturing server:
# oc exec -it deploy/manufacturing-server -- cat /etc/fdo/manufacturing-server.conf.d/manufacturing-server.yml
session_store_driver:
Directory:
path: /etc/fdo/sessions/
ownership_voucher_store_driver:
Directory:
path: /etc/fdo/ownership_vouchers/
public_key_store_driver:
Directory:
path: /etc/fdo/keys/
bind: 0.0.0.0:8080
rendezvous_info:
- dns: rendezvous-server-fdo.apps.test.xxxx.com
device_port: 80
owner_port: 80
protocol: http
protocols:
plain_di: false
diun:
key_path: /etc/fdo/keys/diun_key.der
cert_path: /etc/fdo/keys/diun_cert.pem
key_type: SECP256R1
mfg_string_type: SerialNumber
allowed_key_storage_types:
- FileSystem
- Tpm
manufacturing:
manufacturer_cert_path: /etc/fdo/keys/manufacturer_cert.pem
manufacturer_private_key: /etc/fdo/keys/manufacturer_key.der
owner_cert_path: /etc/fdo/keys/owner_cert.pem
device_cert_ca_private_key: /etc/fdo/keys/device_ca_key.der
device_cert_ca_chain: /etc/fdo/keys/device_ca_cert.pem
FDO client version is 0.4.12
RHEL version is 9.2.
# /usr/libexec/fdo/fdo-client-linuxapp --version
fdo-client-linuxapp 0.4.12
# # cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"
...
And the fdo client service status:
# systemctl status fdo-client-linuxapp.service
○ fdo-client-linuxapp.service - FDO client
Loaded: loaded (/usr/lib/systemd/system/fdo-client-linuxapp.service; enabled; preset: disabled)
Active: inactive (dead) since Fri 2023-12-01 10:43:53 EST; 2min 34s ago
Process: 1192 ExecStartPre=/usr/sbin/restorecon /boot/device-credentials (code=exited, status=255/EXCEPTION)
Process: 1197 ExecStart=/usr/libexec/fdo/fdo-client-linuxapp (code=exited, status=0/SUCCESS)
Process: 1203 ExecStartPost=/usr/bin/mv /boot/device-credentials /etc/device-credentials (code=exited, status=1/FAILURE)
Main PID: 1197 (code=exited, status=0/SUCCESS)
CPU: 9ms
Dec 01 10:43:53 microshift-node systemd[1]: Starting FDO client...
Dec 01 10:43:53 microshift-node restorecon[1192]: /usr/sbin/restorecon: lstat(/boot/device-credentials) failed: No such file or directory
Dec 01 10:43:53 microshift-node fdo-client-linuxapp[1197]: 2023-12-01T15:43:53.851Z INFO fdo_client_linuxapp > No usable device credential located, skipping Device Onboarding
Dec 01 10:43:53 microshift-node mv[1203]: /usr/bin/mv: cannot stat '/boot/device-credentials': No such file or directory
Dec 01 10:43:53 microshift-node systemd[1]: fdo-client-linuxapp.service: Deactivated successfully.
Dec 01 10:43:53 microshift-node systemd[1]: Finished FDO client.
@empovit @7flying
I tested the default images (quay.io/vemporop/fdo-xxx-server:1.0
), and I remember encountering the same issue. This might be attributed to a compatibility issue between the fdo-client version (0.4.12) and the default images.
Now, in the current scenario, we would expect the fdo-client version (0.4.12) to be compatible with the 0.4.12 fdo-server images. However, the issue persists. Could this be due to oversight on my part? Or is it possible that using the same fdo-operator to deploy new images with old configurations is causing the problem?
Thanks for the update @morvencao! This is odd because I've been running RHEL 9.2 and RHEL 9.3 FDO clients against the operator without issues (old server images).
Running 0.4.12 with the operator isn't as simple as replacing the image, as the operator code must be changed to produce a compatible configuration, and map the right volumes.
The only thing that comes to mind at the moment is a missing or read-only volume mount. E.g. making sure that the ownership voucher persistent volume is writable.
Regarding the client error, you cannot run the onboarding client (/usr/libexec/fdo/fdo-client-linuxapp
) without first making sure that the initialization/manufacturing client (/usr/libexec/fdo/fdo-manufacturing-client
) has been successful. It's the initialization client that produces device credentials (/boot/device-credentials
) on the machine after talking to a manufacturing server.
The sequence is:
@morvencao BTW, in what namespace are you creating your server CRs? Please make sure it's not default
.
Thank you, @empovit , for providing a comprehensive explanation of the entire workflow.
BTW, in what namespace are you creating your server CRs? Please make sure it's not default.
Just to clarify, the namespace is fdo
, not default
.
When reverting to the default images, the device credential still remains ungenerated.
$ oc get pod -o yaml | grep image:
- image: quay.io/vemporop/fdo-manufacturing-server:1.0
image: quay.io/vemporop/fdo-manufacturing-server:1.0
- image: quay.io/vemporop/fdo-owner-onboarding-server:1.0
- image: quay.io/vemporop/fdo-serviceinfo-api-server:1.0
image: quay.io/vemporop/fdo-owner-onboarding-server:1.0
image: quay.io/vemporop/fdo-serviceinfo-api-server:1.0
- image: quay.io/vemporop/fdo-rendezvous-server:1.0
image: quay.io/vemporop/fdo-rendezvous-server:1.0
These are the fdo-client logs from journalctl
:
$ journalctl | grep "Starting FDO client" -A 20
Dec 01 21:10:50 microshift-node systemd[1]: Starting FDO client...
Dec 01 21:10:50 microshift-node systemd[1]: Starting greenboot Health Checks Runner...
Dec 01 21:10:50 microshift-node greenboot[1339]: Running Required Health Check Scripts...
Dec 01 21:10:50 microshift-node restorecon[1336]: /usr/sbin/restorecon: lstat(/boot/device-credentials) failed: No such file or directory
Dec 01 21:10:50 microshift-node fdo-client-linuxapp[1345]: 2023-12-02T02:10:50.391Z INFO fdo_client_linuxapp > No usable device credential located, skipping Device Onboarding
Dec 01 21:10:50 microshift-node 00_required_scripts_start.sh[1348]: Running greenboot Required Health Check Scripts
Dec 01 21:10:50 microshift-node mv[1353]: /usr/bin/mv: cannot stat '/boot/device-credentials': No such file or directory
Dec 01 21:10:50 microshift-node systemd[1]: fdo-client-linuxapp.service: Deactivated successfully.
Dec 01 21:10:50 microshift-node systemd[1]: Finished FDO client.
Dec 01 21:10:50 microshift-node greenboot[1339]: Script '00_required_scripts_start.sh' SUCCESS
Dec 01 21:10:50 microshift-node 01_repository_dns_check.sh[1358]: /etc/ostree/remotes.d is empty, skipping check
Dec 01 21:10:50 microshift-node greenboot[1339]: Script '01_repository_dns_check.sh' SUCCESS
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Device: /dev/watchdog0
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Identity: iTCO_wdt [version 0]
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Timeout: 30 seconds
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Pre-timeout: 0 seconds
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: Timeleft: 29 seconds
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: FLAG DESCRIPTION STATUS BOOT-STATUS
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: KEEPALIVEPING Keep alive ping reply 1 0
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: MAGICCLOSE Supports magic close char 0 0
Dec 01 21:10:50 microshift-node 02_watchdog.sh[1374]: SETTIMEOUT Set timeout (in seconds) 0 0
These are the logs for the manufacturing-server:
$ oc logs deploy/manufacturing-server
2023-12-02T02:00:56.028Z TRACE fdo_data_formats::publickey > Serializing certificate: [48, 130, 1, 86, 48, 129, 252, 160, 3, 2, 1, 2, 2, 8, 119, 19, 128, 182, 7, 89, 9, 17, 48, 10, 6, 8, 42, 134, 72, 206, 61, 4, 3, 2, 48, 49, 49, 13, 48, 11, 6, 3, 85, 4, 3, 12, 4, 68, 73, 85, 78, 49, 19, 48, 17, 6, 3, 85, 4, 10, 12, 10, 114, 101, 100, 104, 97, 116, 46, 99, 111, 109, 49, 11, 48, 9, 6, 3, 85, 4, 6, 19, 2, 85, 83, 48, 30, 23, 13, 50, 51, 49, 49, 50, 57, 48, 55, 49, 49, 49, 49, 90, 23, 13, 50, 52, 49, 49, 50, 56, 48, 55, 49, 49, 49, 49, 90, 48, 49, 49, 13, 48, 11, 6, 3, 85, 4, 3, 12, 4, 68, 73, 85, 78, 49, 19, 48, 17, 6, 3, 85, 4, 10, 12, 10, 114, 101, 100, 104, 97, 116, 46, 99, 111, 109, 49, 11, 48, 9, 6, 3, 85, 4, 6, 19, 2, 85, 83, 48, 89, 48, 19, 6, 7, 42, 134, 72, 206, 61, 2, 1, 6, 8, 42, 134, 72, 206, 61, 3, 1, 7, 3, 66, 0, 4, 145, 126, 23, 38, 174, 135, 28, 161, 11, 28, 98, 10, 206, 133, 106, 44, 167, 137, 29, 174, 166, 101, 237, 24, 141, 134, 194, 209, 227, 126, 132, 104, 132, 21, 86, 169, 179, 112, 102, 81, 24, 137, 102, 105, 45, 160, 182, 159, 178, 121, 150, 211, 147, 231, 38, 136, 98, 18, 128, 207, 138, 115, 101, 199, 48, 10, 6, 8, 42, 134, 72, 206, 61, 4, 3, 2, 3, 73, 0, 48, 70, 2, 33, 0, 151, 52, 188, 32, 28, 127, 204, 17, 8, 223, 230, 78, 24, 232, 90, 83, 12, 33, 65, 231, 97, 197, 229, 8, 84, 252, 51, 136, 245, 84, 219, 187, 2, 33, 0, 185, 79, 108, 85, 167, 237, 113, 9, 28, 191, 11, 58, 95, 111, 168, 189, 95, 94, 7, 30, 87, 130, 74, 60, 89, 254, 152, 14, 35, 134, 153, 219]
2023-12-02T02:00:56.028Z INFO fdo_manufacturing_server > Listening on 0.0.0.0:8080
2023-12-02T02:00:56.028Z INFO fdo_manufacturing_server > Scheduling maintenance every 60 seconds
2023-12-02T02:00:56.028Z TRACE mio::poll > registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
2023-12-02T02:06:58.942Z TRACE mio::poll > registering event source with poller: token=Token(2), interests=READABLE | WRITABLE
2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::conn > Conn::read_head
2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::conn > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Busy }
2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::conn > Conn::read_head
2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::io > received 836 bytes
2023-12-02T02:06:58.942Z TRACE tracing::span > parse_headers
2023-12-02T02:06:58.942Z TRACE tracing::span::active > -> parse_headers
2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::role > Request.parse bytes=836
2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::role > Request.parse Complete(526)
2023-12-02T02:06:58.942Z TRACE tracing::span::active > <- parse_headers
2023-12-02T02:06:58.942Z TRACE tracing::span > -- parse_headers
2023-12-02T02:06:58.942Z DEBUG hyper::proto::h1::io > parsed 9 headers
2023-12-02T02:06:58.942Z DEBUG hyper::proto::h1::conn > incoming body is content-length (310 bytes)
2023-12-02T02:06:58.942Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:58.942Z TRACE hyper::proto::h1::decode > decode; state=Length(310)
2023-12-02T02:06:58.942Z DEBUG hyper::proto::h1::conn > incoming body completed
2023-12-02T02:06:58.942Z TRACE warp::filters::method > method::GET?: POST
2023-12-02T02:06:58.942Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "ping"?: "fdo"
2023-12-02T02:06:58.942Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "fdo"?: "fdo"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "101"?: "101"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "msg"?: "msg"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "10"?: "210"
2023-12-02T02:06:58.942Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "fdo"?: "fdo"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "101"?: "101"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "msg"?: "msg"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "12"?: "210"
2023-12-02T02:06:58.942Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "fdo"?: "fdo"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "101"?: "101"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "msg"?: "msg"
2023-12-02T02:06:58.942Z TRACE warp::filters::path > "210"?: "210"
2023-12-02T02:06:58.942Z TRACE warp::filters::header > header2("content-length")
2023-12-02T02:06:58.942Z TRACE warp::filters::header > exact?("Content-Type", "application/cbor")
2023-12-02T02:06:58.942Z TRACE warp::filters::header > optional("Authorization")
2023-12-02T02:06:58.943Z TRACE fdo_http_wrapper::server > Raw request: "8450ca6a8896835d5f3a9f51dbff12e6f6ba67454344483338340398960018300d0a18bf186a001868184f184f18e318e61899182105181f183b188e184118db120f18a118fd187f18f018551879183018e4186318e9187d185f18ce18f8189918b318a618e8181a18f618fc1829185b18231889185d18be130018301849182f13184f184518500a182f185114187c188e18d6189418411891185718f51858187e183918f618ea18c918ca1886186918c718a7187718c6185015182718b518b0181b182518d418551863189a18bf1849185d1834181c15001830182a0e188f185418661881189918ea18d118cf18da18b3183618f7188d0518c3181e18fc185718fa189a18c318f7182c189b10186318b218c1186e1819188a183b185f185c1875187818f101183e18c41846187e18b618ad184a1877"
2023-12-02T02:06:58.946Z TRACE fdo_data_formats::types > Using fully interoperable KDF
2023-12-02T02:06:58.946Z DEBUG fdo_manufacturing_server::handlers::diun > Got new keys, setting EncryptionKeys { cipher_suite: Some(A256Gcm), keys: Some([[ DERIVEDKEYS: REDACTED ]]) }
2023-12-02T02:06:58.946Z TRACE fdo_store::directory > Attempting to store data to /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U= (temporary at /etc/fdo/sessions/.osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=.tmp)
2023-12-02T02:06:58.946Z TRACE fdo_store::directory > Attempting to load data from /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=
2023-12-02T02:06:58.946Z TRACE fdo_http_wrapper::server > Raw response: "d28457a2012619010050ca6a8896835d5f3a9f51dbff12e6f6baa1190101830a0259015e8159015a308201563081fca0030201020208771380b607590911300a06082a8648ce3d0403023031310d300b06035504030c044449554e31133011060355040a0c0a7265646861742e636f6d310b3009060355040613025553301e170d3233313132393037313131315a170d3234313132383037313131315a3031310d300b06035504030c044449554e31133011060355040a0c0a7265646861742e636f6d310b30090603550406130255533059301306072a8648ce3d020106082a8648ce3d03010703420004917e1726ae871ca10b1c620ace856a2ca7891daea665ed188d86c2d1e37e8468841556a9b3706651188966692da0b69fb27996d393e72688621280cf8a7365c7300a06082a8648ce3d04030203490030460221009734bc201c7fcc1108dfe64e18e85a530c2141e761c5e50854fc3388f554dbbb022100b94f6c55a7ed71091cbf0b3a5f6fa8bd5f5e071e57824a3c59fe980e238699db59011e819896001830183118ad18671876186e188818b1182a18e718ad07182118a5181f187e186418551841181b188718b218bf1899187818de1850183618ad185e183918e318ae185c021836185c181c18d6183c1833171820182d0d18ea18e01855187800183018f918291855185d18d91418ec187c189e187118f6185e1873189d182918cc16183b181d188818cc18b5187918261884080a18bd18b8187518361318a118e7189205189b18d218de18d31824187018e718f918f2182d1898187f00183010183218e318fe1841188b1318ce0a188f18f318b918de18de18d218e0185a18ba18ea1848181f189218a618271859186b188918c813185818e9185f183118f018c318c21845187b1832183a18ea18ff188f18f218e518d8187a188158407104b83005289992ed192cb5ab44c37c3f805f4759729f91fe8169d7af55a5c8dc82f893dfc8cc6249f19eb23dec97413401be5a48bb2689eb245fea8b08189f"
2023-12-02T02:06:58.946Z INFO manufacturing-server > 10.128.0.76:45630 "POST /fdo/101/msg/210 HTTP/1.1" 200 "-" "-" 3.869842ms
2023-12-02T02:06:58.946Z TRACE tracing::span > encode_headers
2023-12-02T02:06:58.946Z TRACE tracing::span::active > -> encode_headers
2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::role > Server::encode status=200, body=Some(Known(741)), req_method=Some(POST)
2023-12-02T02:06:58.946Z TRACE tracing::span::active > <- encode_headers
2023-12-02T02:06:58.946Z TRACE tracing::span > -- encode_headers
2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::io > buffer.queue self.len=208 buf.len=741
2023-12-02T02:06:58.946Z DEBUG hyper::proto::h1::io > flushed 949 bytes
2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::conn > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::conn > Conn::read_head
2023-12-02T02:06:58.946Z TRACE hyper::proto::h1::conn > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn > Conn::read_head
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::io > received 638 bytes
2023-12-02T02:06:59.200Z TRACE tracing::span > parse_headers
2023-12-02T02:06:59.200Z TRACE tracing::span::active > -> parse_headers
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::role > Request.parse bytes=638
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::role > Request.parse Complete(636)
2023-12-02T02:06:59.200Z TRACE tracing::span::active > <- parse_headers
2023-12-02T02:06:59.200Z TRACE tracing::span > -- parse_headers
2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::io > parsed 10 headers
2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::conn > incoming body is content-length (2 bytes)
2023-12-02T02:06:59.200Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::decode > decode; state=Length(2)
2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::conn > incoming body completed
2023-12-02T02:06:59.200Z TRACE warp::filters::method > method::GET?: POST
2023-12-02T02:06:59.200Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "ping"?: "fdo"
2023-12-02T02:06:59.200Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "fdo"?: "fdo"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "101"?: "101"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "msg"?: "msg"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "10"?: "212"
2023-12-02T02:06:59.200Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "fdo"?: "fdo"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "101"?: "101"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "msg"?: "msg"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "12"?: "212"
2023-12-02T02:06:59.200Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "fdo"?: "fdo"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "101"?: "101"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "msg"?: "msg"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "210"?: "212"
2023-12-02T02:06:59.200Z TRACE warp::filters::method > method::POST?: POST
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "fdo"?: "fdo"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "101"?: "101"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "msg"?: "msg"
2023-12-02T02:06:59.200Z TRACE warp::filters::path > "212"?: "212"
2023-12-02T02:06:59.200Z TRACE warp::filters::header > header2("content-length")
2023-12-02T02:06:59.200Z TRACE warp::filters::header > exact?("Content-Type", "application/cbor")
2023-12-02T02:06:59.200Z TRACE warp::filters::header > optional("Authorization")
2023-12-02T02:06:59.200Z TRACE fdo_store::directory > Attempting to load data from /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=
2023-12-02T02:06:59.200Z TRACE fdo_http_wrapper::server > Raw request: "81f6"
2023-12-02T02:06:59.200Z TRACE fdo_store::directory > Attempting to store data to /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U= (temporary at /etc/fdo/sessions/.osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=.tmp)
2023-12-02T02:06:59.200Z TRACE fdo_store::directory > Attempting to load data from /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U=
2023-12-02T02:06:59.200Z TRACE fdo_http_wrapper::server > Raw response: "d08343a10103a1054ce25f4cce0467fe8a8caea555559f26fd5abeedc9d19875aff68e38b2b4fe3e89aafb"
2023-12-02T02:06:59.200Z INFO manufacturing-server > 10.128.0.76:45630 "POST /fdo/101/msg/212 HTTP/1.1" 200 "-" "-" 376.605µs
2023-12-02T02:06:59.200Z TRACE tracing::span > encode_headers
2023-12-02T02:06:59.200Z TRACE tracing::span::active > -> encode_headers
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::role > Server::encode status=200, body=Some(Known(43)), req_method=Some(POST)
2023-12-02T02:06:59.200Z TRACE tracing::span::active > <- encode_headers
2023-12-02T02:06:59.200Z TRACE tracing::span > -- encode_headers
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::io > buffer.queue self.len=95 buf.len=43
2023-12-02T02:06:59.200Z DEBUG hyper::proto::h1::io > flushed 138 bytes
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn > Conn::read_head
2023-12-02T02:06:59.200Z TRACE hyper::proto::h1::conn > flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn > Conn::read_head
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::io > received 0 bytes
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::io > parse eof
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn > State::close_read()
2023-12-02T02:07:13.950Z DEBUG hyper::proto::h1::conn > read eof
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn > State::close_write()
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn > State::close_read()
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn > State::close_write()
2023-12-02T02:07:13.950Z TRACE hyper::proto::h1::conn > flushed({role=server}): State { reading: Closed, writing: Closed, keep_alive: Disabled }
2023-12-02T02:07:13.950Z DEBUG hyper::proto::h1::conn > error shutting down IO: Transport endpoint is not connected (os error 107)
2023-12-02T02:07:13.950Z DEBUG hyper::server::conn::spawn_all > connection error: error shutting down connection: Transport endpoint is not connected (os error 107)
2023-12-02T02:07:13.950Z TRACE mio::poll > deregistering event source from poller
2023-12-02T02:17:56.049Z TRACE fdo_store::directory > File at /etc/fdo/sessions/osjnyAhbfkdPjQaZqAKHG5EWyqSXMS0p5thQu9j++0U= has expired, attempting removal
What's weird is that the logs indicate the generation of a new session, and there is a response sent back to the fdo-client. However, the session is later removed due to expiration. I'm uncertain if this behavior is related to the following line:
2023-12-02T02:07:13.950Z DEBUG hyper::proto::h1::conn > error shutting down IO: Transport endpoint is not connected (os error 107)
2023-12-02T02:07:13.950Z DEBUG hyper::server::conn::spawn_all > connection error: error shutting down connection: Transport endpoint is not connected (os error 107)
I need to clarify that I'm utilizing the libvirt platform to bootstrap a new VM from an HTTP Boot server, simulating a physical device. I observed that the VM shutoff after the OS installer completes its tasks. Before the shutoff, I notice the manufacturing client initiating, which should correspond to the request handling and session setup in manufacturing logs. However, immediately after that, the VM shuts down, therefore the manufacturing client begins the session without obtaining the device credential. There is disconnection between the manufacturing client and the manufacturing server because of the VM shutoff, could this be the root cause?
FYI I'm looking into an issue where a simplified RHEL for Edge installer image with RHEL 9.2 contacts a manufacturer server also installed on RHEL 9.2, but doesn't finish initialization without blocking the installation. The handshake doesn't produce either device credentials or an ownership voucher. This in turn leads to onboarding not being performed. The server isn't being run by the FDO operator - it's regular dnf install
from the default repo.
@empovit The issue you're investigating closely resembles one I've faced, execpt that the FDO servers in my env are deployed with fdo-operator. I'm utilizing a simplified RHEL for Edge installer (RHEL 9.2) image to initiate a VM. It appears that the FDO hasn't completed initialization, though the installation isn't blocked.
I'm actively investigating this issue as well and endeavoring to gather additional information to assist you further.
I installed FDO servers version v0.4.12 using the fdo-aio RPM package from the default repository. The FDO client is also version 0.4.12. Subsequently, I discovered that device credentials can be generated and located within the device.
fdo client and device credential:
bash-5.1# /usr/libexec/fdo/fdo-client-linuxapp --version
fdo-client-linuxapp 0.4.12
bash-5.1# fdo-owner-tool dump-device-credential /boot/device-credentials
Active: true
Protocol Version: 101
Device Info:
Device GUID: xxx
...
fdo servers:
# /usr/libexec/fdo/fdo-manufacturing-server --version
fdo-manufacturing-server 0.4.12
# systemctl status fdo-aio
● fdo-aio.service - FDO service info API server
Loaded: loaded (/usr/lib/systemd/system/fdo-aio.service; enabled; preset: disabled)
Active: active (running) since Mon 2023-12-04 02:26:59 EST; 1h 54min ago
Main PID: 121500 (fdo-admin-tool)
Tasks: 565 (limit: 405667)
Memory: 102.9M
CPU: 1.171s
CGroup: /system.slice/fdo-aio.service
├─121500 /usr/bin/fdo-admin-tool aio --directory /etc/fdo/aio --binary-path /usr/libexec/fdo
├─121613 /usr/libexec/fdo/fdo-manufacturing-server
├─121614 /usr/libexec/fdo/fdo-owner-onboarding-server
├─121615 /usr/libexec/fdo/fdo-rendezvous-server
└─121616 /usr/libexec/fdo/fdo-serviceinfo-api-server
Dec 04 02:26:59 dell-per750-01.mcoasis.com systemd[1]: Started FDO service info API server.
Dec 04 02:26:59 dell-per750-01.mcoasis.com fdo-admin-tool[121500]: INFO fdo_admin_tool::aio::execute > Starting AIO
Dec 04 02:26:59 dell-per750-01.mcoasis.com fdo-admin-tool[121500]: INFO fdo_admin_tool::aio::execute > Waiting until services are ready
Dec 04 02:27:00 dell-per750-01.mcoasis.com fdo-admin-tool[121500]: INFO fdo_admin_tool::aio::execute > All services are ready
Dec 04 02:27:00 dell-per750-01.mcoasis.com fdo-admin-tool[121500]: INFO fdo_admin_tool::aio::execute > AIO running
# /usr/libexec/fdo/fdo-manufacturing-server --version
fdo-manufacturing-server 0.4.12
Does this imply that the fdo-client 0.4.12 is incompatible with fdo-servers (version 1.0) deployed by the fdo-operator? If so, How to update the fdo-servers using the fdo-operator, to enable the integration with the fdo-client 0.4.12?
@empovit another suggestion from my side to spot version incompatibilities more easily would be to match the name of the version of the fdo-operators to the version of the fdo servers, aka not 1.0, but 0.4.12, for instance. I don't know if the change would be possible at this point though.
An old version of the FDO manufacturing client had issues that would prevent device initialization from finishing successfully when using TPM. If you added coreos.inst.skip_reboot
kernel argument to break into the dracut shell before reboot, you could see something similar to the following lines in the journal:
:/# journalctl | grep manufact
Dec 04 14:48:59 localhost kernel: Command line: BOOT_IMAGE=/images/pxeboot/vmlinuz rd.neednet=1 coreos.inst.crypt_root=1 coreos.inst.isoroot=RHEL-9-2-0-BaseOS-x86_64 coreos.inst.install_dev=/dev/vda coreos.inst.image_file=/run/media/iso/image.raw.xz coreos.inst.insecure fdo.manufacturing_server_url=http://10.16.211.77:8080 fdo.diun_pub_key_insecure=true quiet coreos.inst.skip_reboot
Dec 04 14:48:59 localhost kernel: Kernel command line: BOOT_IMAGE=/images/pxeboot/vmlinuz rd.neednet=1 coreos.inst.crypt_root=1 coreos.inst.isoroot=RHEL-9–2–0–BaseOS-x86_64 coreos.inst.install_dev=/dev/vda coreos.inst.image_file=/run/media/iso/image.raw.xz coreos.inst.insecure fdo.manufacturing_server_url=http://10.16.211.77:8080 fdo.diun_pub_key_insecure=true quiet coreos.inst.skip_reboot
Dec 04 14:48:59 localhost kernel: random: crng init done (trusting CPU's manufacturer)
Dec 04 14:48:59 localhost dracut-cmdline[371]: insecure fdo.manufacturing_server_url=http://10.16.211.77:8080 fdo.diun_pub_key_insecure=true quiet coreos.inst.skip_reboot
Dec 04 14:50:44 localhost.localdomain manufacturing-client-service[740]: 2023-12-04T14:50:44.0032 INFO fdo_manufacturing_client > No usable device credential located, performing Device Onboarding
Dec 04 14:50:44 localhost.localdomain manufacturing-client-service[740]: 2023-12-04T14:50:44.0062 INFO fdo_manufacturing_client > Performing DIUN
Dec 04 14:50:44 localhost.localdomain systemd[1]: manufacturing-client.service: Main process exited, code=killed, status=15/TERM
Dec 04 14:50:44 localhost.localdomain systemd[1]: manufacturing-client.service: Failed with result 'signal'.
Dec 04 14:50:44 localhost.localdomain systemd[1]: manufacturing-client.service: Triggering OnFailure= dependencies.
You likely hit that issue with your build of RHEL 9.2. It has been fixed in later builds, and shouldn't be seen in RHEL 9.3.
I've also merged #124 that clarifies the use of compatible server and client versions, and makes the operator use the images built by the FDO team by default, instead of custom-built images.
With this, I'm closing this issue.
I am deploying FDO services in an OpenShift cluster using the fdo-operator, and the fdo-servers are operational. However, upon booting my device (simulated as a VM using libvirt), I encounter an issue where no device credentials are found.
Upon inspecting the device system logs, I observed the following entries:
Additionally, the manufacturing-server logs indicate the occurrence of the following errors:
I need assistance in resolving this session storage error and ensuring the proper retrieval of device credentials during the device boot process.
env:
BTW, I tested both the default FDO images (
quay.io/vemporop/fdo-xxx-server:1.0
) and built the latest FDO server images (v0.4.12
). Unfortunately, the same issue persists in both cases.