89luca89 / distrobox

Use any linux distribution inside your terminal. Enable both backward and forward compatibility with software and freedom to use whatever distribution you’re more comfortable with. Mirror available at: https://gitlab.com/89luca89/distrobox
https://distrobox.it/
GNU General Public License v3.0
10.1k stars 417 forks source link

[Error] first enter fails on MicroOS (~ tumbleweed) for me, since when new logging system was introduced #286

Closed dfaggioli closed 2 years ago

dfaggioli commented 2 years ago

Since commit "enter: improve fist-start logging and show user some progress" the first enter into the distrobox, the one that initializes it, fails like this:

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox create 
36b761c003b670ee45f7cfb78959ce56cebbdb603092cc3d73a44f9cce649c30
Distrobox 'tumbleweed' successfully created.
To enter, run:

distrobox-enter tumbleweed

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox enter tumbleweed
Container tumbleweed is not runnung.
Starting container tumbleweed
run this command to follow along:

podman logs -f tumbleweed

 Starting container...                       [ OK ]
 Installing basic packages...               ./distrobox-enter: line 432: kill: (7048) - No such process

An error occurred

An error occurred

I'm attaching the output of both bash -x ./distrobox-create -v and bash -x ./distrobox-enter -v tumbleweed:

dfaggioli commented 2 years ago

Note that, trying to enter the container again, works:

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox enter tumbleweed
dario@tumbleweed:~/src/toolbox/distrobox.git>

But of course the container itself is not fully and properly initialized

dfaggioli commented 2 years ago

This seems similar to what has been reported here: https://bugzilla.opensuse.org/show_bug.cgi?id=1199790

I did apply the cgroups fix to TW's podman, and that does not solve the problem (in fact, it looks to me like the "Error", although reported in the logs, is harmless)

89luca89 commented 2 years ago

mmmh it seems like it's a problem with podman logs In the logs it goes straight to the "cleanup" of killing the logs process, which does not exist, as it exited

If you get latest main it should have fixed this as it's using a --since flag to have the latest available logs

Another way to check is

distrobox create... instead of enter, go with

podman start ...

and attach to the logs with podman logs -f ... and see if the logs exits for some reasons

dfaggioli commented 2 years ago

BTW, the container is doing (at least part of) its initialization! I.e., if I enter inside it again, right after the first enter failed, I find it busy installing the packages that we install in distrobox-init:

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox-enter tumbleweed
dario@tumbleweed:~/src/toolbox/distrobox.git> ps aux |grep zypper
65534      755  0.0  0.0   2964  1076 ?        SLs  10:23   0:00 /usr/bin/earlyoom -r 0 -m 6 -s 10 -n -p --avoid (^|/)(systemd|Xorg|X|Xwayland|xdm|sddm|kdm|gdm|lightdm|plasmashell|kwin_wayland|kwin_x11|ssh|yast|yast2|y2controlcenter|zypper|rpm|dnf)$ --prefer (^|/)(java|Web Content|firefox|chromium|chrome|opera|falkon|ffmpeg|vlc|mpv|akregator|thumbnail.so)$
root     10100  9.6  0.3 562616 116588 ?       Sl   11:56   0:10 zypper install -y bash bc curl diffutils findutils less libvte-2* ncurses pinentry procps shadow sudo systemd wget util-linux
dario    10663  0.0  0.0   3704  1916 pts/0    S+   11:58   0:00 grep --color=auto zypper
89luca89 commented 2 years ago

It should have be addressed by commit 1508e4d3a65eb91538fb438da4ffeb884cf9c623

89luca89 commented 2 years ago

BTW, the container is doing (at least part of) its initialization! I.e., if I enter inside it again, right after the first enter failed, I find it busy installing the packages that we install in distrobox-init:

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox-enter tumbleweed
dario@tumbleweed:~/src/toolbox/distrobox.git> ps aux |grep zypper
65534      755  0.0  0.0   2964  1076 ?        SLs  10:23   0:00 /usr/bin/earlyoom -r 0 -m 6 -s 10 -n -p --avoid (^|/)(systemd|Xorg|X|Xwayland|xdm|sddm|kdm|gdm|lightdm|plasmashell|kwin_wayland|kwin_x11|ssh|yast|yast2|y2controlcenter|zypper|rpm|dnf)$ --prefer (^|/)(java|Web Content|firefox|chromium|chrome|opera|falkon|ffmpeg|vlc|mpv|akregator|thumbnail.so)$
root     10100  9.6  0.3 562616 116588 ?       Sl   11:56   0:10 zypper install -y bash bc curl diffutils findutils less libvte-2* ncurses pinentry procps shadow sudo systemd wget util-linux
dario    10663  0.0  0.0   3704  1916 pts/0    S+   11:58   0:00 grep --color=auto zypper

Yea, because the error is due to podman logs -f ... either failing or exiting So the enter process cannot follow what the container is doing

I have experience of this, prior to the commit I linked above, that should have fixed this, and looking at your logs it seems like you don't have that commit applied, as your podman logs command is missing a --since flag

89luca89 commented 2 years ago

@dfaggioli also I see the kill error is at line 432, which does not match the current code, but the code previous to that commit

dfaggioli commented 2 years ago

@dfaggioli also I see the kill error is at line 432, which does not match the current code, but the code previous to that commit

That was indeed coming from within the bisect. Now I'm on main, I have that commit, and I see this:

dario@Wayrath:~> distrobox-create --version
distrobox: 1.2.16
dario@Wayrath:~> distrobox-enter tumbleweed
Container tumbleweed is not running.
Starting container tumbleweed
run this command to follow along:

 podman logs -f tumbleweed

 Starting container...                       [ OK ]
 Installing basic packages...               /home/dario/bin/distrobox-enter: line 439: kill: (15501) - No such process

An error occurred
dfaggioli commented 2 years ago

I've also added a ps aux | grep $container_log_pid, inside of the while in distrobox-enter, and the logs command seems coorect to me:

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox-enter tumbleweed
Container tumbleweed is not running.
Starting container tumbleweed
run this command to follow along:

 podman logs -f tumbleweed

 Starting container...                      dario    16247  150  0.1 1498812 60832 pts/1   Sl+  12:19   0:00 podman logs --since 2022-05-23T12:19:26.839028336+02:00 -f tumbleweed
dario    16263  0.0  0.0   6640  2308 pts/1    S+   12:19   0:00 grep 16247
 [ OK ]
 Installing basic packages...               dario    16247  128  0.1 1498812 60832 pts/1   Sl+  12:19   0:00 podman logs --since 2022-05-23T12:19:26.839028336+02:00 -f tumbleweed
dario    16270  0.0  0.0   6640  2340 pts/1    S+   12:19   0:00 grep 16247
dario    16247  112  0.1 1498812 60832 pts/1   Sl+  12:19   0:00 podman logs --since 2022-05-23T12:19:26.839028336+02:00 -f tumbleweed
dario    16272  0.0  0.0   6640  2408 pts/1    S+   12:19   0:00 grep 16247
./distrobox-enter: line 439: kill: (16247) - No such process

An error occurred
89luca89 commented 2 years ago

@dfaggioli also I see the kill error is at line 432, which does not match the current code, but the code previous to that commit

That was indeed coming from within the bisect. Now I'm on main, I have that commit, and I see this:

dario@Wayrath:~> distrobox-create --version
distrobox: 1.2.16
dario@Wayrath:~> distrobox-enter tumbleweed
Container tumbleweed is not running.
Starting container tumbleweed
run this command to follow along:

 podman logs -f tumbleweed

 Starting container...                     [ OK ]
 Installing basic packages...             /home/dario/bin/distrobox-enter: line 439: kill: (15501) - No such process

An error occurred

OK can you run it with --verbose ? That way we see better what is happening

dfaggioli commented 2 years ago

OK can you run it with --verbose ? That way we see better what is happening

fisrt-enter.log

89luca89 commented 2 years ago

OK can you run it with --verbose ? That way we see better what is happening

fisrt-enter.log

Mmmh seems Like the log exits, fifo is closed, kill tries go kill but could not find the pid

I cannot reproduce this I'll try to debug other ways

dfaggioli commented 2 years ago

@89luca89 what should happen if I just do something like podman logs --since 5m -f tumbleweed ? Should the logging process stay active, and continue feeding the FIFO? Because, here, it just prints and exits, right after the last available line of output...

89luca89 commented 2 years ago

@89luca89 what should happen if I just do something like podman logs --since 5m -f tumbleweed ? Should the logging process stay active, and continue feeding the FIFO? Because, here, it just prints and exits, right after the last available line of output...

Yea should remain as we specify -f which is --follow, and also it should NOT have a last line because now in init there is a sort of reconciliation-loop to keep some files in sync between host and guest

89luca89 commented 2 years ago

Probably you're using distrobox-enter from new, but distrobox-init is older?

dfaggioli commented 2 years ago

@89luca89 what should happen if I just do something like podman logs --since 5m -f tumbleweed ? Should the logging process stay active, and continue feeding the FIFO? Because, here, it just prints and exits, right after the last available line of output...

Yea should remain as we specify -f which is --follow, and also it should NOT have a last line because now in init there is a sort of reconciliation-loop to keep some files in sync between host and guest

Yep, I do see the HOST_WATCH thing (in the log itself)...

dfaggioli commented 2 years ago

@89luca89 new data point: --root works :-O

dfaggioli commented 2 years ago

In fact, sudo podman logs --since 2022-05-23T13:00:25.402224027+02:00 -f tumbleweed does not exit and keep printing the various log lines (e.g., during the container initialization)

89luca89 commented 2 years ago

Ok can reproduce on an already running container

Eg: I have a container running for 10 minutes (so initialization finished 10mins ago)

If I podman logs --since 1m -f distrobox it will exit

So it seems in your case, no new logs since the timestamp, which sounds wrong, if the container is starting just now it should do new logs everytime...

I'll check if I can have something a bit more robust set up :+1:

dfaggioli commented 2 years ago

Ok can reproduce on an already running container

Eg: I have a container running for 10 minutes (so initialization finished 10mins ago)

If I podman logs --since 1m -f distrobox it will exit

So it seems in your case, no new logs since the timestamp, which sounds wrong, if the container is starting just now it should do new logs everytime...

Basing on my experiments, I think that it does log, and some data is sent to the FIFO, but then it exits. Then, the while read consume the stuff that it finds in the FIFO and, since there's no longer any logging process putting more data into it, the loop ends.

I'll check if I can have something a bit more robust set up +1

Maybe it could work if we invoke podman logs multiple time, inside of the loop, and without -f... I'll see about trying something like that myself too...

89luca89 commented 2 years ago

Ok can reproduce on an already running container Eg: I have a container running for 10 minutes (so initialization finished 10mins ago) If I podman logs --since 1m -f distrobox it will exit So it seems in your case, no new logs since the timestamp, which sounds wrong, if the container is starting just now it should do new logs everytime...

Basing on my experiments, I think that it does log, and some data is sent to the FIFO, but then it exits. Then, the while read consume the stuff that it finds in the FIFO and, since there's no longer any logging process putting more data into it, the loop ends.

I'll check if I can have something a bit more robust set up +1

Maybe it could work if we invoke podman logs multiple time, inside of the loop, and without -f... I'll see about trying something like that myself too...

I thought of it, but problem is that you could have the same log lines multiple times, making it very difficult to have some type of "progress"

That's what I was doing previously, with the indefinite dots progress

89luca89 commented 2 years ago

Hey @dfaggioli can you test the fix i've pushed to branch fix/286 ?

It's now polling a new log every 1s so it should work better for this case

massimo-zaniboni commented 2 years ago

Only as additional data point: I cannot replicate the bug on Tumbleweed. I tested on current installation, and also after an upgrade and reboot.

This the version of OS and podman

vmi880359:~ # cat /etc/os-release 
NAME="openSUSE Tumbleweed"
# VERSION="20220519"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20220519"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:tumbleweed:20220519"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"

vmi880359:~ # podman --version
podman version 4.0.3

This is podman info podman-info.txt

This is the log of distrobox-create --verbose -n test distrobox-create-on-podman-4.0.3.log

89luca89 commented 2 years ago

Thanks @massimo-zaniboni could you also test if branch fix/286 works for you also?

Also which version of distrobox you're using? This bug should be triggered by a commit still not released on the package version of distrobox, only on latest ~git

massimo-zaniboni commented 2 years ago

Testing new version of distrobox

could you also test if branch fix/286 works for you also?

I created a temporary RPM with the fix286 branch. Here, there is the RPM https://build.opensuse.org/package/show/home:mzan-it:branches:Virtualization:containers/distrobox

I installed it

distrobox --version
distrobox: 1.2.16

The error is still not present.

Error on "fifo"

Regarding the error like

+ '[' 127 -ne 0 ']'
+ printf '\nAn error occurred\n'
+ rm -f '/tmp/.*.fifo'

signaled by @dfaggioli, probably only this error, is a instead false error. In fact I can replicate it also on correctly initializated containers. If enter with

distrobox-enter --verbose -n test 2> error.log

error.log will contains no errors. If I exit from the container with exit, no error is generated. If I exit from the container with Ctrl-d, this error is generated. But it does not affect container initialization at all. It is only at the moment of the exit with Ctr-d.

Other errors

I'm sincere: I know very few of podman and distrobox, and I skipped many parts of the discussion on this issue. I will read in next days, when I have time, and maybe I will do some test.

massimo-zaniboni commented 2 years ago

The error is still not present.

On SUSE tumbleweed, with podman version 4.0.3, with my configurations.

I will test the new version of distrobox also on Leap.

massimo-zaniboni commented 2 years ago

I will test the new version of distrobox also on Leap.

I tested on SUSE Leap 15.3, podman version 3.4.4, both with and without my patch, and with distrobox version: 1.2.16 with the fix286, but in this case the container is not initializated correctly.

89luca89 commented 2 years ago

I will test the new version of distrobox also on Leap.

I tested on SUSE Leap 15.3, podman version 3.4.4, both with and without my patch, and with distrobox version: 1.2.16 with the fix286, but in this case the container is not initializated correctly.

Hi @massimo-zaniboni so with the fix286 does not work? can you provide some logs?

dfaggioli commented 2 years ago

Hey @dfaggioli can you test the fix i've pushed to branch fix/286 ?

It's now polling a new log every 1s so it should work better for this case

Hey... cool, thanks for working on this!

The problem I was having is fixed, but distrobox-enter is now stuck:

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox-enter tumbleweed
Container tumbleweed is not running.
Starting container tumbleweed
run this command to follow along:

 podman logs -f tumbleweed

 Starting container...                       [ OK ]
 Installing basic packages...                [ OK ]
 Setting up read-only mounts...              [ OK ]
 Setting up read-write mounts...             [ OK ]
 Setting up host's sockets integration...    [ OK ]
 Integrating host's themes, icons, fonts...  [ OK ]
 Setting up package manager exceptions...    [ OK ]
 Setting up sudo...                          [ OK ]
 Setting up groups...                        [ OK ]
 Setting up users...                        

Let me fetch some logs...

dfaggioli commented 2 years ago

Error on "fifo"

Regarding the error like

+ '[' 127 -ne 0 ']'
+ printf '\nAn error occurred\n'
+ rm -f '/tmp/.*.fifo'

signaled by @dfaggioli, probably only this error, is a instead false error. In fact I can replicate it also on correctly initializated containers.

Yes, but this does not have much to do with this issue. The problem here is that podman logs -f terminates immediately, after some circumnstances, and this is not compatible with how distrobox uses it (in the main branch). TBH, this seems a podman bug to me, but it's probably easier/better if we make distrobox robust against something like this anyway...

massimo-zaniboni commented 2 years ago

Hi @massimo-zaniboni so with the fix286 does not work? can you provide some logs?

Yes. This is the log of distrobox: 1.2.16 with the fix286, running on podman version 3.4.4 with the fix about the empty line log (but the problem is present also on plain podman 3.4.4), on OpenSUSE Leap 15.3.

cat /etc/os-release 
NAME="openSUSE Leap"
VERSION="15.3"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.3"
PRETTY_NAME="openSUSE Leap 15.3"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.3"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
podman info
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.30-150300.8.3.1.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: unknown'
  cpus: 2
  distribution:
    distribution: '"opensuse-leap"'
    version: "15.3"
  eventLogger: journald
  hostname: "70"
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 100
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.3.18-150300.59.68-default
  linkmode: dynamic
  logDriver: journald
  memFree: 2062200832
  memTotal: 4116885504
  ociRuntime:
    name: runc
    package: runc-1.0.3-27.1.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.3
      spec: 1.0.2-dev
      go: go1.16.10
      libseccomp: 2.5.3
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-0.4.7-3.15.1.x86_64
    version: |-
      slirp4netns version 0.4.7
      commit: unknown
      libslirp: 4.3.1-git
  swapFree: 2148507648
  swapTotal: 2148507648
  uptime: 6m 49.77s
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.opensuse.org
  - docker.io
store:
  configFile: /home/mzan/.config/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 1
    stopped: 2
  graphDriverName: btrfs
  graphOptions: {}
  graphRoot: /home/mzan/.local/share/containers/storage
  graphStatus:
    Build Version: 'Btrfs v4.19.1 '
    Library Version: "102"
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  volumePath: /home/mzan/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 1653134400
  BuiltTime: Sat May 21 14:00:00 2022
  GitCommit: ""
  GoVersion: go1.13.15
  OsArch: linux/amd64
  Version: 3.4.4
distrobox-create -i registry.opensuse.org/opensuse/tumbleweed:latest -n test3
distrobox-enter --verbose -n test3 2> test3.log

test3.log

dfaggioli commented 2 years ago

I tested on SUSE Leap 15.3, podman version 3.4.4, both with and without my patch, and with distrobox version: 1.2.16 with the fix286, but in this case the container is not initializated correctly.

Hi @massimo-zaniboni so with the fix286 does not work? can you provide some logs?

I think he means (but correct me if I'm wrong!) that, on Tumbleweed, everything works for him. Which probably means this problem I'm having and the problem he's having on Leap, are two different issues.

89luca89 commented 2 years ago

So both @dfaggioli and @massimo-zaniboni confirm that reverting back to stock 1.2.15 does not present any problem regarding initialization process?

If this is becoming too much of an hassle (podman logs strange behavior and such) I might just revert to the old "dots" progress and call it a day

dfaggioli commented 2 years ago

So both @dfaggioli and @massimo-zaniboni confirm that reverting back to stock 1.2.15 does not present any problem regarding initialization process?

For me, yes.

If this is becoming too much of an hassle (podman logs strange behavior and such) I might just revert to the old "dots" progress and call it a day

Well, sure, but I think the approach you're using in the branch can work... And the new logging it cooler! :-)

Since, AFAICT, it's basically only me that am seeing this problem, I can continue to debug/work on it myself. Just let me know.

89luca89 commented 2 years ago

Hey @dfaggioli can you test the fix i've pushed to branch fix/286 ? It's now polling a new log every 1s so it should work better for this case

Hey... cool, thanks for working on this!

The problem I was having is fixed, but distrobox-enter is now stuck:

dario@Wayrath:~/src/toolbox/distrobox.git> ./distrobox-enter tumbleweed
Container tumbleweed is not running.
Starting container tumbleweed
run this command to follow along:

 podman logs -f tumbleweed

 Starting container...                     [ OK ]
 Installing basic packages...              [ OK ]
 Setting up read-only mounts...            [ OK ]
 Setting up read-write mounts...           [ OK ]
 Setting up host's sockets integration...  [ OK ]
 Integrating host's themes, icons, fonts...    [ OK ]
 Setting up package manager exceptions...  [ OK ]
 Setting up sudo...                        [ OK ]
 Setting up groups...                      [ OK ]
 Setting up users...                      

Let me fetch some logs...

I think we are near a solution maybe, a log from this situation could be helpful to understand why it didn't catch the container_setup_done string from the logs (that's why it is stuck)

Both a podman logs, and an execution with --verbose of the enter should help

massimo-zaniboni commented 2 years ago

So both @dfaggioli and @massimo-zaniboni confirm that reverting back to stock 1.2.15 does not present any problem regarding initialization process?

No. If I install on OpenSUSE Leap following the instruction of the web-site, I have the initialization problems. So for me, Distrobox or Podman, is broken.

If there is safer version to use, and you provide a link, I will test it.

If this is becoming too much of an hassle

I propose to split this issue in two issues:

89luca89 commented 2 years ago

So both @dfaggioli and @massimo-zaniboni confirm that reverting back to stock 1.2.15 does not present any problem regarding initialization process?

No. If I install on OpenSUSE Leap following the instruction of the web-site, I have the initialization problems. So for me, Distrobox or Podman, is broken.

If there is safer version to use, and you provide a link, I will test it.

If this is becoming too much of an hassle

I propose to split this issue in two issues:

* one for OpenSUSE Leap and Podman v3.4.4

* another for OpenSUSE Tumbleweed  and Podman v4

@massimo-zaniboni ok let's open a new issue with Leap+Podman 3.4.4, with all the logs provided please :+1:

dfaggioli commented 2 years ago

I think we are near a solution maybe, a log from this situation could be helpful to understand why it didn't catch the container_setup_done string from the logs (that's why it is stuck)

Both a podman logs, and an execution with --verbose of the enter should help

Yep, was already doing that. But when enabling logs, it works.

I've also caught it both working and non-working, when trying a few times, so it's a race.

I believe the problem is this:

                # ITERATION N, TIME=T1
                ${container_manager} logs --since "${log_timestamp}" "${container_name}" 2> /dev/null > "/tmp/.${container_name}.fifo"
                while IFS= read -r line; do
                        case "${line}" in
                        ...
                        ...
                        ...
                        # ITERATION N, TIME=T2, "container_setup_done" IS PRINTED NOW
                        ...
                        ...
                        ...
                        esac
                done < "/tmp/.${container_name}.fifo"
                # ITERATION N, TIME=T3
                log_timestamp="$(date +%FT%T.%N%:z)"
                sleep 1
        done

Now, at ITERATION N+1, we log what has been printed since T3, but "container_setup_done" has been printed at T2 which was before T3, and hence we've missed it.

Something like this:

        while true; do
                ${container_manager} logs --since "${log_timestamp}" "${container_name}" 2> /dev/null > "/tmp/.${container_name}.fifo"
                log_timestamp="$(date +%FT%T.%N%:z)"

Reduces the race-window substantially (and in fact, it's working reliably in my tests), but it's still possible that we print "container_setup_done after ${container_manager} logs ... and before log_timestamp="$(date...)

89luca89 commented 2 years ago

I think we are near a solution maybe, a log from this situation could be helpful to understand why it didn't catch the container_setup_done string from the logs (that's why it is stuck) Both a podman logs, and an execution with --verbose of the enter should help

Yep, was already doing that. But when enabling logs, it works.

I've also caught it both working and non-working, when trying a few times, so it's a race.

I believe the problem is this:

                # ITERATION N, TIME=T1
                ${container_manager} logs --since "${log_timestamp}" "${container_name}" 2> /dev/null > "/tmp/.${container_name}.fifo"
                while IFS= read -r line; do
                        case "${line}" in
                        ...
                        ...
                        ...
                        # ITERATION N, TIME=T2, "container_setup_done" IS PRINTED NOW
                        ...
                        ...
                        ...
                        esac
                done < "/tmp/.${container_name}.fifo"
                # ITERATION N, TIME=T3
                log_timestamp="$(date +%FT%T.%N%:z)"
                sleep 1
        done

Now, at ITERATION N+1, we log what has been printed since T3, but "container_setup_done" has been printed at T2 which was before T3, and hence we've missed it.

Something like this:

        while true; do
                ${container_manager} logs --since "${log_timestamp}" "${container_name}" 2> /dev/null > "/tmp/.${container_name}.fifo"
                log_timestamp="$(date +%FT%T.%N%:z)"

Reduces the race-window substantially (and in fact, it's working reliably in my tests), but it's still possible that we print "container_setup_done after ${container_manager} logs ... and before log_timestamp="$(date...)

Great! Yes I thought of a race condition

Another solution would be to use a pivot value, we register the new log_timestamp before the podman logs command, like

        while true; do
                log_timestamp_new="$(date +%FT%T.%N%:z)"
                ${container_manager} logs --since "${log_timestamp}" "${container_name}" 2> /dev/null > "/tmp/.${container_name}.fifo"

                ...
                ...
                log_timestamp=$log_timestamp_new
                sleep1
          done

this should leave no holes in the logs, maybe a little little overlap, but that's better than a hole

dfaggioli commented 2 years ago

Another solution would be to use a pivot value, we register the new log_timestamp before the podman logs command, like

        while true; do
                log_timestamp_new="$(date +%FT%T.%N%:z)"
                ${container_manager} logs --since "${log_timestamp}" "${container_name}" 2> /dev/null > "/tmp/.${container_name}.fifo"

                ...
                ...
                log_timestamp=$log_timestamp_new
                sleep1
          done

this should leave no holes in the logs, maybe a little little overlap, but that's better than a hole

Yes, that's how I'd do it in general. What I don't know/haven't checked, is whether it prints duplicated lines, and if that could be a problem. E.g., if the fifo ends up containing "distrobox: Setting up groups" twice, do we print [ OK ] twice as well?

89luca89 commented 2 years ago

@dfaggioli good thing is I can reproduce your "stuck" behavior with fix286!

I've implemented the change in the previous comment, and it seems it's working as intended, at worst we get the same message twice, which is still better than a "stuck" startup

Can you check if it is working as intended for you? both for existing and for new containers?

Yes, that's how I'd do it in general. What I don't know/haven't checked, is whether it prints duplicated lines, and if that could be a problem. E.g., if the fifo ends up containing "distrobox: Setting up groups" twice, do we print [ OK ] twice as well?

We will print twice the message:

[luca-linux@microos ~]$ distrobox stop tw; distrobox create -n tw; distrobox enter tw
Do you really want to stop tw? [Y/n]: 
tw
Distrobox named 'tw' already exists.
To enter, run:

distrobox-enter tw

Container tw is not running.
Starting container tw
run this command to follow along:

 podman logs -f tw

 Starting container...                       [ OK ]
 Installing basic packages...                [ OK ]
 Setting up read-only mounts...              [ OK ]
 Setting up read-write mounts...             [ OK ]
 Setting up host's sockets integration...    [ OK ]
 Integrating host's themes, icons, fonts...  [ OK ]
 Setting up package manager exceptions...    [ OK ]
 Setting up sudo...                          [ OK ]
 Setting up groups...                        [ OK ]
 Setting up users...                         [ OK ]
 Setting up package manager exceptions...    [ OK ]
 Setting up sudo...                          [ OK ]
 Setting up groups...                        [ OK ]
 Setting up users...                         [ OK ]
 Executing init hooks...                     [ OK ]

Container Setup Complete!
luca-linux@tw:~$
dfaggioli commented 2 years ago

Can you check if it is working as intended for you? both for existing and for new containers?

Works in both scenarios! :-)

89luca89 commented 2 years ago

Perfect I'll merge this and should be solved thanks for helping me! @dfaggioli