coreos / rpm-ostree

⚛📦 Hybrid image/package system with atomic upgrades and package layering
https://coreos.github.io/rpm-ostree
Other
862 stars 195 forks source link

rpm-ostree fails with mTLS authenticated remote since at least 2020.05 #2715

Closed w4tsn closed 3 years ago

w4tsn commented 3 years ago

So I'm pretty sure now that this is a bug in rpm-ostree. I'm using an mTLS remote which does not work in current releases of rpm-ostree. The error is error: While pulling ref: While fetching https://foo/repo/summary.sig: [58] Problem with the local SSL certificate.

Host system details

# rpm-ostree status
State: idle
Deployments:
* ostree://fedora-iot:fedora/stable/aarch64/iot
                   Version: 33.20210214.0 (2021-02-14T14:39:34Z)
                    Commit: 31cf1ce36d47b5c447368155f34f9e9b28cc453aeebb63d3b33ae58fef26529a
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31

Expected vs actual behavior

With an mTLS remote configured:

# rpm-ostree rebase myremote:myref/stable/aarch64
error: While pulling ref: While fetching https://foo/repo/summary.sig: [58] Problem with the local SSL certificate

Expected:

# rpm-ostree rebase othermo:othermo/stable/aarch64
...
Success!

Steps to reproduce it

  1. You'll need an mTLS remote. I might setup one, but that'll take more time.
  2. Flash a fairly recent Fedora IoT 33 raw.xz image onto a system like Raspberry Pi 3 or 4.
  3. Setup the remote with the below provided config example
  4. Try to rebase to a ref on that remote

Example mTLS based remote config:

[remote "mtls"]
url=https://mtls-ostree.example.com/repo
gpg-verify=true
gpgkeypath=/etc/pki/example
tls-client-cert-path=/root/.cert/client.crt
tls-client-key-path=/root/.cert/client.key

Possible pitfalls

The problem now could still be related to TLS specific stuff. Maybe the CA cert is malformed (which works in ostree and curl but not in rpm-ostree; unlikely but possible). Maybe the client cert is malformed. Maybe a specific algorithm or TLS version causes issues. Those are also things to explore.

System info - non-working vs. working

Non-working:

# rpm-ostree --version
rpm-ostree:
 Version: '2021.3'
 Git: 43d9af3e6e26dd9cb93d0d67f8c53e418a342d35
 Features:
  - compose
  - rust
  - fedora-integration
# ostree --version
libostree:
 Version: '2020.8'
 Git: 7893b6cb02cecf24dabd0574093d2c4dbf9348c9
 Features:
  - libcurl
  - libsoup
  - gpgme
  - ex-fsverity
  - libarchive
  - selinux
  - openssl
  - libmount
  - systemd
  - release
  - p2p
# journalctl -u rpm-ostreed
Mar 31 11:40:01 localhost rpm-ostree[1756]: client(id:cli dbus:1.118 unit:session-1.scope uid:0) added; new total=1
Mar 31 11:40:01 localhost rpm-ostree[1756]: Locked sysroot
Mar 31 11:40:01 localhost rpm-ostree[1756]: Initiated txn Rebase for client(id:cli dbus:1.118 unit:session-1.scope uid:0): /org/>
Mar 31 11:40:01 localhost rpm-ostree[1756]: Process [pid: 1821 uid: 0 unit: session-1.scope] connected to transaction progress
Mar 31 11:40:01 localhost rpm-ostree[1756]: libostree HTTP error from remote foo for <https://ostree.foo/repo/summary>
Mar 31 11:40:01 localhost rpm-ostree[1756]: Txn Rebase on /org/projectatomic/rpmostree1/fedora_iot failed: While pulling foo>
Mar 31 11:40:01 localhost rpm-ostree[1756]: Unlocked sysroot
Mar 31 11:40:01 localhost rpm-ostree[1756]: Process [pid: 1821 uid: 0 unit: session-1.scope] disconnected from transaction progr>
Mar 31 11:40:01 localhost rpm-ostree[1756]: client(id:cli dbus:1.118 unit:session-1.scope uid:0) vanished; remaining=0
Mar 31 11:40:01 localhost rpm-ostree[1756]: In idle state; will auto-exit in 63 seconds
Mar 31 11:41:05 localhost rpm-ostree[1756]: In idle state; will auto-exit in 61 seconds

Working:

# rpm-ostree status (on a second, working machine)
State: idle
Deployments:
  ostree://fedora-iot:fedora/stable/aarch64/iot
                   Version: 32.20200603.0 (2020-06-03T10:45:43Z)
                    Commit: c02bd26925b4e849fd0e53f3645e97b5cb22f47d7614c5a047d6200c64b3421b
              GPGSignature: Valid signature by 7D22D5867F2A4236474BF7B850CB390B3C3359C4
# rpm-ostree --version
rpm-ostree:
 Version: '2020.5'
 Git: 4e84aa3e2663b76207f701ec0e6697edf09f1412
 Features:
  - compose
  - rust
# ostree --version
libostree:
 Version: '2020.6'
 Git: 097c6430b2c72b036224857e661557f3b6a15914
 Features:
  - libcurl
  - libsoup
  - gpgme
  - ex-fsverity
  - libarchive
  - selinux
  - openssl
  - libmount
  - systemd
  - release
  - p2p

Would you like to work on the issue?

I'm not confident to work on this just yet, if that's indeed a bug in rpm-ostree.

w4tsn commented 3 years ago

OK, after working with this a bit more I think the problem is actually caused by my image creation process. When preparing the OS image I forgot several SELinux labels on /sysroot and especially /sysroot/ostree which means that all files and folders had effectively root_t as their type. After setting the labels as it is done in the Fedora IoT image, I now get a streight "Error: Permission denied" and an AVC denial on a lock create op.

AVC avc:  denied  { create } for  pid=2568 comm="rpm-ostree" name="lock" scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:usr_t:s0 tclass=file permissive=0

That's as far as I got. I'm now trying to figure out where I forgot to set labels. Unfortunately setfiles reports that there are no default labels for /ostree so I can't just do restorecon.

Is there any documentation on how the labels should look like?

Apart from that I suppose that this is solely my personal problem. The only thing I could take away from this for rpm-ostree might be that the error messages are very misleading and lack a certain expressiveness. But then again it's debatable if a screwed up system, like mine apparently is, should be considered in error handling and reporting anyway.

EDIT:

After setting SELinux to permissive mode the SSL certificate error returned, so I suspect I just screwed up the file contexts even more and since I did not reboot in the meantime maybe processes had started / run with wrong process contexts or something. So I'm eventually back at the problem that the mTLS remote does not work in rpm-ostree while it does in ostree.

EDIT 1:

I was able to verify that the mTLS remote / configuration works on an older system with rpm-ostree 2020.5. This system started as Fedora IoT and was rebased onto my custom OSTree. Next I'll flash the latest Fedora IoT and try to use the mTLS remote to rebase onto my custom OSTree. If this is indeed rpm-ostree related in any way it should fail.

EDIT 2:

So I now setup a stock Fedora IoT 33 raw image based install and added my mTLS remote to it. rpm-ostree rebase on that remote always returns a error: Problem with the local SSL certificate while this works at least with rpm-ostree 2020.05. Also ostree itself has no problem interacting with that remote. I'm able to do a successful rpm-ostree rebase if I "manually" pull the refs with ostree pull first. I'm now pretty sure that this is a bug in some way or another in rpm-ostree introduced somewhere after 2020.05.

cgwalters commented 3 years ago

This part of the code is pure ostree; rpm-ostree basically defers to libostree for all HTTP requests there. It could be a regression there, but it's much more likely IMO to be libcurl related; or possibly openssl. I'd try using e.g. rpm-ostree usroverlay combined with directly rpm -ivh --force on different libcurl builds.

Have you looked at what version of TLS is being negotiated? If you can get a packet trace (eliding certificates) that might help.

Looks like we never added unit/CI tests for tls-client-* to libostree =/

cgwalters commented 3 years ago

Can you try e.g. attaching strace -f -o /tmp/strace-rpmostree.log -s 2048 -p $(systemctl show -p MainPID rpm-ostreed | cut -f 2 -d =) and looking at the end of /tmp/strace-rpmostree.log? (Don't paste the full thing here as it's likely to contain certificate material; but it'd be useful to know if we're e.g. getting EPERM from open() or something else)

w4tsn commented 3 years ago

It looks like the process has problems accessing the cert file:

openat(AT_FDCWD, "/root/.cert/gateway.crt", O_RDONLY) = -1 ENOENT (No such file or directory)

Directly after this thee above mentioned error message is send out.

Apparently the file is there and is correctly picked up by curl and ostree:

# ls -laZ /root/.cert
-rw-r-----. 1 root 1001 system_u:object_r:home_cert_t:s0  1334 Apr  1 16:24 gateway.crt
-rw-r--r--. 1 root root system_u:object_r:home_cert_t:s0  1058 Apr  1 15:12 gateway.csr
-rw-------. 1 root root system_u:object_r:home_cert_t:s0  1675 Apr  1 15:12 gateway.key

Could it be that rpm-ostree and SELinux are the root cause? Could there be some access control problem here?

EDIT: SELinux is unlikely because the audit log is clean and setenforce 0 does not help either.

cgwalters commented 3 years ago

It's likely https://github.com/coreos/rpm-ostree/commit/341ec7d0446a0505d5a4e1747c2283d40ca4823b

The more correct thing here is to store those keys in /etc, not /root (aka /var/roothome). You could put them in /etc/ostree/keys for example.

Another short term workaround is to paste this into systemctl edit rpm-ostreed:

[Service]
ProtectHome=no

Now, I do want to avoid regressions...if you argue strongly for it we can consider reverting. But I'd really like not to :smile:

cgwalters commented 3 years ago

(We could weaken this to ProtectHome=read-only for example)

w4tsn commented 3 years ago

Ohhh. Well that makes sense. I suppose it's a good thing to use that protective matters and no I don't have strong arguments against it. I'll have to update our firmware and write a migration but that's hardly a strong argument (for anyone else but me at least :D)

Well that's that I suppose. Thanks for helping me figure this out, much appreciated.