crc-org / crc

CRC is a tool to help you run containers. It manages a local OpenShift 4.x cluster, Microshift or a Podman VM optimized for testing and development purposes
https://crc.dev
Apache License 2.0
1.26k stars 242 forks source link

[BUG] crc stuck on Linux machine #4342

Closed lilyLuLiu closed 1 month ago

lilyLuLiu commented 2 months ago

Machine: Red Hat Enterprise Linux release 8.10 (Ootpa) The crc command stuck even for crc version or crc help. If reboot the machine, crc back to normal. This happens frequently and only in this machine.

[cloud-user@rhel-crcqe ~]$ ps -elf | grep crc 0 S cloud-u+ 97926 1 0 80 0 - 55650 - Sep01 ? 00:00:00 /bin/bash crc-e2e/run.sh -targetFolder crc-e2e -junitFilename e2e-junit.xml -bundleLocation /home/cloud-user/OpenshiftLocal/bundle/4.16.7/crc_libvirt_4.16.7_amd64.crcbundle -e2eTagExpression ~@minimal && ~@story_microshift 0 S cloud-u+ 97950 97926 0 80 0 - 527961 - Sep01 ? 00:00:02 ./e2e.test --bundle-location=/home/cloud-user/OpenshiftLocal/bundle/4.16.7/crc_libvirt_4.16.7_amd64.crcbundle --pull-secret-file=/home/cloud-user/crc-e2e/pull-secret --cleanup-home=false --crc-memory= --godog.tags=linux && ~@minimal && ~@story_microshift --godog.format=junit 5 S dnsmasq 105622 1 0 80 0 - 14412 - Sep01 ? 00:00:02 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/crc.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper 1 S root 105623 105622 0 80 0 - 14405 - Sep01 ? 00:00:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/crc.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper 0 S cloud-u+ 105950 1593 0 80 0 - 528058 - Sep01 ? 00:00:07 /home/cloud-user/.crc/bin/crc daemon 6 S qemu 106370 1 71 80 0 - 4005494 - Sep01 ? 19:49:04 /usr/libexec/qemu-kvm -name guest=crc,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-5-crc/master-key.aes"} -blockdev {"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.cc.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/crc_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine pc-q35-rhel8.6.0,usb=off,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,memory-backend=pc.ram -accel kvm -cpu host,migratable=on -m 10752 -object {"qom-type":"memory-backend-memfd","id":"pc.ram","share":true,"x-use-canonical-path-for-ramblock-id":false,"size":11274289152} -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 0186d168-bfe2-4fa7-a2c4-fb2565bfa5b4 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=35,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=off,strict=on -device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device qemu-xhci,id=usb,bus=pci.3,addr=0x0 -blockdev {"driver":"file","filename":"/home/cloud-user/.crc/cache/crc_libvirt_4.16.7_amd64/crc.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":true,"driver":"qcow2","file":"libvirt-2-storage","backing":null} -blockdev {"driver":"file","filename":"/home/cloud-user/.crc/machines/crc/crc.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"} -device virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 -chardev socket,id=chr-vu-fs0,path=/var/lib/libvirt/qemu/domain-5-crc/fs0-fs.sock -device vhost-user-fs-pci,id=fs0,chardev=chr-vu-fs0,tag=dir0,bus=pci.1,addr=0x0 -netdev tap,fd=36,id=hostnet0,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:fd:fc:07:21:82,bus=pci.2,addr=0x0 -chardev stdio,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -audiodev {"id":"audio1","driver":"none"} -vnc 127.0.0.1:0,audiodev=audio1 -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 -object {"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"} -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.5,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 0 S cloud-u+ 106704 106645 0 80 0 - 509433 - Sep01 ? 00:00:02 crc podman-env 0 S cloud-u+ 121100 1 0 80 0 - 55650 - Sep01 ? 00:00:00 /bin/sh crc-support/run.sh main https://crcqe-asia.s3.amazonaws.com/nightly/microshift/4.16.7/20240901/RHEL-8.10 crc-linux-amd64.tar.xz.sha256sum /home/cloud-user/OpenshiftLocal/crc/main true true crc true false 0 S cloud-u+ 121123 121100 0 80 0 - 546235 - Sep01 ? 00:00:00 crc cleanup 0 S cloud-u+ 124218 1 0 80 0 - 55650 - 02:38 ? 00:00:00 /bin/sh crc-support/run.sh main https://crcqe-asia.s3.amazonaws.com/nightly/crc/20240902/RHEL-8.10 crc-linux-amd64.tar.xz.sha256sum /home/cloud-user/OpenshiftLocal/crc/main true true crc true false 0 S cloud-u+ 124241 124218 0 80 0 - 490872 - 02:38 ? 00:00:00 crc cleanup 0 S cloud-u+ 128044 127877 0 80 0 - 55496 - 07:33 pts/0 00:00:00 grep --color=auto crc

lilyLuLiu commented 2 months ago

@anjannath made some investigation on the machine: crc makes use of the secrets service of dbus on linux to store the pull-secret, but to unlock the secrets collection it needs a password, from a GUI you'll see a dialog asking you to enter the password, but while on a SSH session this is not possible so its stuck.

praveenkumar commented 2 months ago

@anjannath does this happen when pull-secret-file is used as part of config?

anjannath commented 2 months ago

@anjannath does this happen when pull-secret-file is used as part of config?

this happens early on when we want to initialize the config object, so this happens even when pull-secret-file is set, when executing this line: https://github.com/crc-org/crc/blob/4e80c4c48c6c0c13d0399cc0436a0a38d9dda0a2/cmd/crc/cmd/root.go#L152

this tries to determine if the secrets store is accessible by trying to store a value, but if the login secrets collection is locked, and crc is being run in an ssh session then it gets stuck, because the prompt to unlock the keyring is a GUI prompt

albfan commented 2 months ago

Agreed with @praveenkumar. Even if we want to triage and resolve this (crc stuck with accessing pull request) which probably is just timeout and warn about the problem (no access to secret service: Something for user to resolve) --pull-secret-file is a valid workaround, as we don't want any prompt for a non interactive session.

anjannath commented 2 months ago

This'll be stuck even when we use the --pull-secret-file flag or set it before hand using crc config set pull-secret-file

from crc we should at least let the user know that the secret service is not accessible and should not block forever when failing to access the keyring

we hit this blocking issue when checking if the keyring is accessible, https://github.com/crc-org/crc/blob/4e80c4c48c6c0c13d0399cc0436a0a38d9dda0a2/pkg/crc/config/secret_config.go#L68

@albfan you mentioned using busctl instead to try to access the keyring and determine if its accessible, i think that'll solve this issue

albfan commented 2 months ago

I check documentation and a collection may or may not ask for a prompt

https://www.freedesktop.org/wiki/Specifications/secret-storage-spec/secrets-api-0.1.html#eggdbus-method-org.freedesktop.Secrets.Service.CreateCollection

I did a quick test and it always ask me for prompt:

$ cat keyring-create.py
#!/usr/bin/env python

from pydbus import SessionBus
from gi.repository import GLib

collection_name = "crc-test"
properties = {"org.freedesktop.Secret.Collection.Label": GLib.Variant.new_string(collection_name)}

ses_bus = SessionBus()
service_name = 'org.freedesktop.secrets'
secret_service = ses_bus.get(service_name, '/org/freedesktop/secrets')

mainloop = GLib.MainLoop()

def _received_pw(dismissed, object_path):
    print("dismissed?", dismissed, object_path)
    mainloop.quit()

def show_prompt(prompt_id):
    prompt = ses_bus.get(service_name, prompt_id)
    prompt.onCompleted = _received_pw
    prompt.Prompt("random_id_for_window")
    mainloop.run()
    print('Prompt closed')

def add_my_collection():
    result = secret_service.CreateCollection(properties, "")
    print("result from CreateCollection", result)
    if result[1] != '/':
        show_prompt(result[1])

def main():
    add_my_collection()

if __name__ == '__main__':
    main()
$ cat keyring-list.py 
#!/usr/bin/env python

from pydbus import SessionBus
from gi.repository import GLib

collection_name = "MyTestCollection"
properties = {"org.freedesktop.Secret.Collection.Label": GLib.Variant.new_string(collection_name)}

ses_bus = SessionBus()
service_name = 'org.freedesktop.secrets'
secret_service = ses_bus.get(service_name, '/org/freedesktop/secrets')

mainloop = GLib.MainLoop()

def list_collections():
    print('print collection names')
    for test_collect in secret_service.Collections:
        print(test_collect)

def main():
    list_collections()

if __name__ == '__main__':
    main()
$ cat keyring-delete.py 
#!/usr/bin/env python

from pydbus import SessionBus
from gi.repository import GLib

collection_name = "crc-test"
properties = {"org.freedesktop.Secret.Collection.Label": GLib.Variant.new_string(collection_name)}

ses_bus = SessionBus()
service_name = 'org.freedesktop.secrets'
secret_service = ses_bus.get(service_name, '/org/freedesktop/secrets')

mainloop = GLib.MainLoop()

def _received_pw(dismissed, object_path):
    print("dismissed?", dismissed, object_path)
    mainloop.quit()

def show_prompt(prompt_id):
    prompt = ses_bus.get(service_name, prompt_id)
    prompt.onCompleted = _received_pw
    prompt.Prompt("random_id_for_window")
    mainloop.run()
    print('Prompt closed')

def add_my_collection():
    result = secret_service.CreateCollection(properties, "")
    print("result from CreateCollection", result)
    if result[1] != '/':
        show_prompt(result[1])

def remove_my_collection():
    print('print collection names')
    for test_collect in secret_service.Collections:
        print(test_collect)
        #if collection_name in test_collect:
        if True:
            print('deleting collection')
            this_collection = ses_bus.get(service_name, test_collect)
            result = this_collection.Delete()
            print(result)
            if result != '/':
                show_prompt(result)

def main():
    #add_my_collection()
    remove_my_collection()

if __name__ == '__main__':
    main()

but after create two same collections, they get a different suffix:

$ ./keyring-create.py 
result from CreateCollection ('/', '/org/freedesktop/secrets/prompt/p25')
dismissed? False /org/freedesktop/secrets/collection/crc_2dtest
Prompt closed
[alberto@fedora crc]$ ./keyring-create.py 
result from CreateCollection ('/', '/org/freedesktop/secrets/prompt/p26')
dismissed? False /org/freedesktop/secrets/collection/crc_2dtest_5f1
Prompt closed
[alberto@fedora crc]$ ./keyring-list.py 
print collection names
/org/freedesktop/secrets/collection/crc_2dtest_5f1
/org/freedesktop/secrets/collection/crc_2dtest

And it always ask for a prompt, so I'm not sure how crc can access a collection without prompting, or a locked collection can ask for prompt and block the crc execution

Can we patch the crc cli to add extra traces on CreateCollection or CreateItem to check when it needs a prompt and fail like this?

anjannath commented 2 months ago

And it always ask for a prompt, so I'm not sure how crc can access a collection without prompting, or a locked collection can ask for prompt and block the crc execution

for crc we are not creating a new collection, but using the pre-existing login collection (actually this is decided for us by the go module we are using) /org/freedesktop/secrets/collection/login or /org/freedesktop/secrets/aliases/default depending on which exists.

maybe i had this wrong observation, but i remember from testing this earlier that on a GUI flow where an users types in their password to login will have the login collection automatically unlocked after successful login

cfergeau commented 2 months ago

but while on a SSH session this is not possible so its stuck.

On my headless RHEL9 machine, I get an error when it tries to access the secret:

INFO Loading bundle: crc_microshift_libvirt_4.16.0_amd64... 
DEBU Cannot load secret from configuration: empty path 
DEBU Cannot load secret from keyring: The name is not activatable 
CRC requires a pull secret to download content from Red Hat.
You can copy it from the Pull Secret section of https://console.redhat.com/openshift/create/local.
? Please enter the pull secret 
cfergeau commented 2 months ago

Similar error with keyring-list.py:

$ python3 ./keyring-list.py 
Traceback (most recent call last):
  File "/home/teuf/dev/crc/./keyring-list.py", line 11, in <module>
    secret_service = ses_bus.get(service_name, '/org/freedesktop/secrets')
  File "/home/teuf/.local/lib/python3.9/site-packages/pydbus/proxy.py", line 44, in get
    ret = self.con.call_sync(
gi.repository.GLib.GError: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name is not activatable (2)

If I install a package providing /org/freedesktop/secrets maybe I'll start seeing these freezes as well.

albfan commented 2 months ago
pkg/crc/cluster/pullsecret.go
const helpMessage = `CRC requires a pull secret to download content from Red Hat.

https://github.com/crc-org/crc/blob/main/pkg/crc/cluster/pullsecret.go#L168

pkg/crc/cluster/pullsecret.go
53:21:  pullSecret, err := promptUserForSecret()

https://github.com/crc-org/crc/blob/main/pkg/crc/cluster/pullsecret.go#L53

Looks like crc detects when session is non interactive, so it cannot prompt and ask for pull-request-file directly

I suppose the hang is in other place.

@cfergeau can you check if you have this service available?

$ busctl list --user | grep secrets
org.freedesktop.secrets                            2965 gnome-keyring-d cloud-user       :1.6          session-4.scope           4 
albfan commented 2 months ago

Using waypipe we get access to graphical interface from a ssh wayland session.

Removing current login storage and restarting it now password is know and gnome-keyring-daemon --replace --unlock works.

That's still a workaround until we identify what locks the login collection.

$ is-collection-locked login
method return time=1726572308.964656 sender=:1.1013 -> destination=:1.1028 serial=51 reply_serial=2
   variant       boolean true

$ echo -n "mypassword" | gnome-keyring-daemon -r --unlock
discover_other_daemon: 0** Message: 13:25:13.345: Replacing daemon, using directory: /run/user/1000/keyring
GNOME_KEYRING_CONTROL=/run/user/1000/keyring
SSH_AUTH_SOCK=/run/user/1000/keyring/ssh

$ is-collection-locked login
method return time=1726572316.108909 sender=:1.1029 -> destination=:1.1030 serial=21 reply_serial=2
   variant       boolean false
cfergeau commented 2 months ago

@cfergeau can you check if you have this service available?

Hadn't seen this request before, but:

$ busctl list --user | grep secrets
$
albfan commented 2 months ago

After rebuild key storage, it was not locked again. there's a log checking each second

We might close this issue, or keep open to continue tracking it

albfan commented 1 month ago

Closing as not reproducible anymore