Open 10b14224cc opened 4 months ago
Thank you for the report. I haven't had the problem in my environment, so I need to ask questions to reproduce the problem in my environment or identify the cause.
The terminal (kitty tab) closes when all the file descriptors of the TTY slave, which are held by the processes in the session, are closed. There are two possibilities. The Bash process hangs, or other processes started from Bash and connected to the TTY are alive.
When I type
exit
and press Enter, sometimes (not always)
How easy is it to reproduce the problem in your environment? For example, can you reproduce the problem within a minute by repeating opening a new tab and closing the tab by running exit
? Or does it happen only once per hour or day?
I'd like to ask you to do experiments on your side because I cannot reproduce the behavior, but depending on the frequency of the problem, I need to reconsider the set of experiments. For example, if that only happens once per hour even when you repeat opening and closing the session, it would be very hard to do many experiments, so I need to prepare configurations to catch everything at one event as much as possible.
ble.sh just hangs here:
Did you confirm that what is hanging is actually the ble.sh? Or does it just mean the terminal tab with ble.sh doesn't close? For example, have you checked the process list to confirm that the Bash process hangs? As mentioned above, there are other possibilities.
It is possible to hook a function on exit
that would print the processes before actually exiting?
It is possible to hook a function on
exit
that would print the processes before actually exiting?
It's possible. You can do something like
ble/function#advice before exit 'your-shell-function'
But I'm not sure if that's useful because it is equivalent to just running
$ your-shell-function; exit
If you want to hook to blesh's exit processing, you can instead set a hook as
blehook EXIT+='your-shell-function'
But even in that case, it's not sufficient to check whether there are remaining processes or not because other processes can be started after the EXIT
hook finished. To confirm whether there are remaining processes, after the hanging happens, you can check the process list (e.g. by running ps uxf
) in another kitty tab. It would be useful to record the tty name before exiting (or you may log the tty name in the EXIT hook). For example,
# blerc
blehook EXIT+='{ date; echo "pid $$"; tty; } >> ~/debug.txt'
then after the hanging happens, while keeping the hanging window, you can run the following command in another terminal window or tab:
$ date
$ tail -n 3 ~/debug.txt
$ ps uxf
If you could manage to do the above, could you give me the output?
To come to the original question, how often do you face with this problem? For example, roughly how many times does this problem happen when you exit the ble.sh session 10 or 100 times? In short, what is the rough probability?
I've put this:
blehook EXIT+='{ date; echo "pid $$"; tty; } >> ~/debug.txt'
in $HOME/.config/blesh/init.sh
To come to the original question, how often do you face with this problem? For example, roughly how many times does this problem happen when you exit the ble.sh session 10 or 100 times? In short, what is the rough probability?
4 every 10?
blehook EXIT+='{ date; echo "pid $$"; tty; } >> ~/debug.txt'
in
$HOME/.config/blesh/init.sh
OK, let's wait for the next hanging. Once the hanging happens, please follow the instructions in https://github.com/akinomyoga/ble.sh/issues/402#issuecomment-1926017866. I mean this part:
after the hanging happens, while keeping the hanging window, you can run the following command in another terminal window or tab:
$ date $ tail -n 3 ~/debug.txt $ ps uxf
To come to the original question, how often do you face with this problem? For example, roughly how many times does this problem happen when you exit the ble.sh session 10 or 100 times? In short, what is the rough probability?
4 every 10?
Great! Then it shouldn't be difficult to collect the information.
Thanks.
USERNAME 18916 0.6 1.5 1135664 255920 tty1 Sl+ 07:12 0:01 kitty
USERNAME 18923 5.5 0.2 41016 37240 pts/1 Ss+ 07:12 0:17 \_ /usr/bin/bash --posix
USERNAME 23814 16.8 0.2 39400 35780 pts/2 Ss 07:17 0:03 \_ /usr/bin/bash --posix
USERNAME 24438 0.0 0.0 12504 5424 pts/2 R+ 07:17 0:00 \_ ps uxf
This part is related. Hmm, what I can tell from the result is that the Bash process 21265 doesn't exist anymore. The remaining Bash processes are 18923 and 23814, but the latter is the terminal window where you run ps uxf
, and the former seems to be another different tab. Also, there are no processes associated with the original TTY session (pts/0).
To make sure, did you have three tabs when running ps uxf
, right? One is the hanging tab, another is the tab where you run ps uxf
, and there was another tab.
To make sure, did you have three tabs when running ps uxf, right?
Yes. The second one hanged.
One is the hanging tab, another is the tab where you run ps uxf, and there was another tab.
Yes
Thank you for the confirmation. Hmm, it is a puzzle that a tab remains even though there are no remaining processes. I don't have an idea now. I'll think about other possibilities. When I come up with one, I'll again ask you to test it.
I thought it should be an issue of remaining processes, which are common to all the terminals, but maybe this could be something specific to kitty. I think I also have to check the behavior in kitty, although I haven't experienced it when I used ble.sh in kitty before.
I tried kitty in my environment, but I don't seem to be able to reproduce the problem.
~/.bashrc
.I have to find a way to consistently reproduce it first.
Do you have any updates on this? Do you still experience the problem?
I messed up my kitty installation.
I can't type characters like :
in neovim anymore.
So I'm trying other terminals.
I messed up my kitty installation.
I can't type characters like
:
in neovim anymore.
From yesterday, many Atuin users are reporting this: https://github.com/atuinsh/atuin/issues/1693
I also experience this problem, came here to report a bug. I am also on Kitty. I was going to run the instructions in the comment and post the output as well but then after a few minutes the process did actually exit :sweat_smile: So maybe that information is of some use
Thank you for the information. Does that mean it doesn't reproduce frequently enough that you can run the instructions in the next occurrence?
@ribru17 Can you provide the information about your environment? What is the result of the following command?
$ ble/widget/display-shell-version
after a few minutes the process did actually exit 😅
Did you see the actual process in the ps
output or another process monitor? If so, do you remember which process remained (e.g. Bash)?
I can try, I will keep this tab open. Also only happens maybe 1/3 times for me
Can you provide the information about your environment? What is the result of the following command?
GNU bash, version 5.2.26(1)-release (x86_64-pc-linux-gnu) [Arch Linux]
ble.sh, version 0.4.0-devel4+27e6309 (noarch) [git 2.43.0, GNU Make 4.4.1, GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)]
bash-completion, version 2.11 (hash:2d02f73e803daa87a06e94e33b2a7b3e672a2c0c, 76338 bytes) (noarch)
locale: LANG=en_US.UTF-8
terminal: TERM=xterm-kitty wcwidth=15.0-west/15.1-2+ri, kitty:0 (1;4000;31)
Did you see the actual process in the
ps
output or another process monitor? If so, do you remember which process remained (e.g. Bash)?
No, sorry, I didn't get a chance to see it: I will try next time it happens to grep through the processes to find something related to bash or blesh!
I will try next time it happens to grep through the processes to find something related to bash or blesh!
Thank you. If there is a chance to check the processes, I'd like to check the process tree and related TTY/PTY information (but not just the existence of the process) as quoted from https://github.com/akinomyoga/ble.sh/issues/402#issuecomment-1926017866:
For example,
# blerc blehook EXIT+='{ date; echo "pid $$"; tty; } >> ~/debug.txt'
then after the hanging happens, while keeping the hanging window, you can run the following command in another terminal window or tab:
$ date $ tail -n 3 ~/debug.txt $ ps uxf
If you could manage to do the above, could you give me the output?
Here is the output of one that just happened:
Edit: my config if needed
#!/bin/bash
# suppress error output
bleopt complete_ambiguous=
bleopt complete_auto_history=
bleopt exec_errexit_mark=''
bleopt prompt_eol_mark=''
bleopt term_index_colors=auto
bleopt exec_elapsed_mark=''
ble-bind -f 'C-SP' 'complete show_menu'
ble-bind -m auto_complete -f 'C-e' auto_complete/cancel
ble-bind -m isearch -f 'RET' isearch/accept-line
ble-bind -m isearch -f 'C-m' isearch/accept-line
ble-bind -m vi_imap -f 'C-c' discard-line
ble-bind -m vi_nmap -f 'C-c' discard-line
ble-bind -m vi_imap -f 'C-RET' accept-line
ble-bind -m vi_imap -f 'S-RET' newline
ble-bind -m vi_nmap -f 'S-RET' accept-line
ble-bind -m vi_nmap -f 'H' vi-command/beginning-of-line
ble-bind -m vi_omap -f 'H' vi-command/beginning-of-line
ble-bind -m vi_nmap -f 'L' vi-command/forward-eol
ble-bind -m vi_omap -f 'L' vi-command/forward-eol
ble-bind -m emacs -f 'S-RET' newline
# for kitty
ble-bind -m auto_complete -f C-i auto_complete/insert
ble-bind -m emacs -f 'M-DEL' kill-backward-fword
ble-bind -m vi_imap -f 'M-DEL' kill-backward-fword
# for wezterm
ble-bind -m auto_complete -f TAB auto_complete/insert
ble-bind -m emacs -f 'M-C-?' kill-backward-fword
ble-bind -m vi_imap -f 'M-C-?' kill-backward-fword
# colors
ble-face -s argument_error bg=red
ble-face -s argument_option fg=#f08080,italic
ble-face -s auto_complete fg=#5b5e5a,italic
ble-face -s cmdinfo_cd_cdpath fg=#96c7ef,bg=black,italic
ble-face -s command_alias fg=blue
ble-face -s command_builtin fg=#ff9966
ble-face -s command_directory fg=#96c7ef
ble-face -s command_file fg=blue
ble-face -s command_function fg=blue
ble-face -s command_keyword fg=purple
ble-face -s disabled fg=#5b5e5a
ble-face -s filename_directory fg=#96c7ef
ble-face -s filename_directory_sticky fg=black,bg=green
ble-face -s filename_executable fg=green,bold
ble-face -s filename_ls_colors none
ble-face -s filename_orphan fg=cyan,bold
ble-face -s filename_other none
ble-face -s filename_setgid fg=black,bg=yellow,underline
ble-face -s filename_setuid fg=black,bg=#ff9966,underline
ble-face -s menu_filter_input fg=black,bg=#e2c792
ble-face -s overwrite_mode fg=black,bg=cyan
ble-face -s prompt_status_line bg=#5b5e5a
ble-face -s region bg=#3a3d37
ble-face -s region_insert bg=#3a3d37
ble-face -s region_match fg=black,bg=#e2c792
ble-face -s region_target fg=black,bg=purple
ble-face -s syntax_brace fg=#838781
ble-face -s syntax_command fg=blue
ble-face -s syntax_comment fg=#e2c792
ble-face -s syntax_delimiter fg=#838781
ble-face -s syntax_document fg=cyan,bold
ble-face -s syntax_document_begin fg=cyan,bold
ble-face -s syntax_error bg=red
ble-face -s syntax_escape fg=#f08080
ble-face -s syntax_expr fg=#c5c2ee
ble-face -s syntax_function_name fg=blue
ble-face -s syntax_glob fg=#ff9966
ble-face -s syntax_history_expansion fg=blue,italic
ble-face -s syntax_param_expansion fg=red
ble-face -s syntax_quotation fg=green
ble-face -s syntax_tilde fg=#c5c2ee
ble-face -s syntax_varname fg=none
ble-face -s varname_array fg=#ff9966
ble-face -s varname_empty fg=#ff9966
ble-face -s varname_export fg=#ff9966
ble-face -s varname_expr fg=#ff9966
ble-face -s varname_hash fg=#ff9966
ble-face -s varname_number fg=none
ble-face -s varname_readonly fg=#ff9966
ble-face -s varname_transform fg=#ff9966
ble-face -s varname_unset bg=red
ble-face -s vbell_erase bg=#3a3d37
# debugging
blehook EXIT+='{ date; echo "pid $$"; tty; } >> ~/debug.txt'
Thank you for the results. Now you can remove the debugging setting.
The result is consistent with @10b14224cc's. There do not seem to exist any processes associated with the hanging tab (whose TTY/PTY is pts/2
). Also, the process 24586
doesn't exist. I'm not sure what causes the delay in closing the kitty tab.
Maybe some settings of TTY cause an issue. In the latest push, there is a fix for Bash 5.2 that might affect the final state of the TTY. You seem to use the second last push (ble-0.4.0-devel4+27e6309), so could you try the latest version? You can update ble.sh by running ble-update
. Then you can close all the tabs, open the tabs again, and try to see if the situation changes.
Thank you, I will do this and report if the issue stops
Thank you, I will do this and report if the issue stops
Are you by any chance using nnn
as file manager?
No, sorry. Also I am still getting the issue; but I realized it is less frequent than I thought, usually only happening with tabs that have been open for quite a while
Thank you for the information. Hmm, so it would be harder to test it. Maybe some specific command that was run in the session is related.
Of course. My hunch says perhaps it is related to the elapsed time feature: I have this disabled in my personal config, and maybe that causes some weird behavior sometimes? I have seen that this only happens usually when the tab that has been open for a while also has had some process running for a while (e.g. I have a vim instance in it for a while and then forget its there, close it and exit the tab, and the hanging starts)
My hunch says perhaps it is related to the elapsed time feature: I have this disabled in my personal config, and maybe that causes some weird behavior sometimes?
The config is just to turn off outputting the elapsed time. The measurement of the elapsed time is always performed by ble.sh even if you turn off bleopt exec_elapsed_mark
. But I'm not sure if we can rule out the elapsed time feature for the culprit.
(e.g. I have a vim instance in it for a while and then forget its there, close it and exit the tab, and the hanging starts)
Thank you. That should be a hint, yet I don't have an idea now.
I strongly suspect there is a subprocess of a shell subprocess still hanging around.
But I cannot reproduce it consistently still.
Thank you for your reply. Yeah, that is my first suspicion, so I asked for the result of the ps
command. However, there do not seem to be any processes in the results, https://github.com/akinomyoga/ble.sh/issues/402#issuecomment-1926302385 and https://github.com/akinomyoga/ble.sh/issues/402#issuecomment-1937401565.
There is still a possibility that some system processes (whose USER is not the current user) started in the session are alive (because ps uxf
doesn't print the system processes), but I'm not sure how that could happen. In case you are still interested in this possibility, the system processes can be listed by ps uaxf
(instead of ps uxf
in the instructions https://github.com/akinomyoga/ble.sh/issues/402#issuecomment-1937354634).
I just had it happen again: here is the result of ps uaxf
while hanging:
And just in case, I ran it again after the hanging tab exited:
note that there were some 4 tabs open in Kitty at the time, only one hanging. It seems that the kitty processes are the exact same.
This may be because of this Kitty option
Ah, I think I found it. I took the diff of your ps uaxf
results before and after the hanging tab closes, where essential differences are found to be
@@ -187,6 +187,7 @@
root 189057 0.0 0.0 0 0 ? I< 07:54 0:00 \_ [kworker/u17:2-rb_allocator]
root 189286 0.0 0.0 0 0 ? I 07:56 0:00 \_ [kworker/6:0-i915-unordered]
root 189359 0.0 0.0 0 0 ? I 07:57 0:00 \_ [kworker/5:0]
+root 189372 0.0 0.0 0 0 ? I 07:59 0:00 \_ [kworker/3:2]
root 1 0.0 0.1 22792 9080 ? Ss Feb09 0:09 /sbin/init
root 972 0.0 0.2 75620 18916 ? Ss Feb09 0:03 /usr/lib/systemd/systemd-journald
root 1015 0.0 0.0 32624 6612 ? Ss Feb09 0:02 /usr/lib/systemd/systemd-udevd
@@ -251,7 +252,7 @@
rileyb 187777 0.0 0.6 284180 53136 ? Ssl Feb13 0:00 | | \_ /home/rileyb/.local/share/nvim/ma
rileyb 187791 0.0 0.3 37244 28584 ? S Feb13 0:00 | | \_ /home/rileyb/.local/share/nvi
rileyb 187088 0.0 0.4 39784 36476 pts/3 Ss Feb13 0:02 | \_ /bin/bash --posix
-rileyb 189363 0.0 0.0 11156 4608 pts/3 R+ 07:57 0:00 | \_ ps uaxf
+rileyb 189379 0.0 0.0 11156 4608 pts/3 R+ 07:59 0:00 | \_ ps uaxf
rileyb 1973 0.0 0.1 229896 10880 ? Ssl Feb09 0:07 \_ /usr/bin/gmenudbusmenuproxy
rileyb 1978 0.0 0.1 985972 14264 ? Ssl Feb09 0:00 \_ /usr/lib/polkit-kde-authentication-agent-1
rileyb 1979 0.0 0.3 981344 24804 ? Ssl Feb09 0:18 \_ /usr/lib/org_kde_powerdevil
@@ -270,7 +271,6 @@
root 111511 0.0 0.0 154872 2080 ? Ss Feb12 0:01 \_ gpg-agent --homedir /etc/pacman.d/gnupg --use-standar
rileyb 131721 0.0 0.0 33575736 4648 ? Sl Feb12 0:00 \_ /opt/brave-bin/chrome_crashpad_handler --monitor-self
rileyb 131723 0.0 0.0 33567524 2304 ? Sl Feb12 0:00 \_ /opt/brave-bin/chrome_crashpad_handler --no-periodic-
-rileyb 187085 0.0 0.0 2632 640 ? S Feb13 0:00 \_ /usr/bin/wl-copy --type text/plain
rileyb 1774 0.0 0.1 988548 14492 ? Sl Feb09 0:00 /usr/bin/kwalletd5 --pam-login 12 14
root 1809 0.0 0.1 469656 9920 ? Ssl Feb09 0:02 /usr/lib/udisks2/udisksd
polkitd 1828 0.0 0.1 386088 14512 ? Ssl Feb09 0:05 /usr/lib/polkit-1/polkitd --no-debug
Then I searched for wl-copy
and found this issue: https://github.com/bugaevc/wl-clipboard/pull/154, where the same problem inside kity is reported.
Here, many questions come to my mind: What does that wl-copy
do? The start time of wl-copy
seems to be the previous day, but the time of ps uaxf
is 7:57. This means that the process of wl-copy
is alive for eight hours at least. A question is why wl-copy
survives for such a long time. Another thing I noticed is that the problematic wl-copy
process doesn't seem to have the associated controlling TTY. Maybe that wl-copy
process creates a new process session with setsid(2)
? But why? I'll look at wl-copy
.
edit: I'm not sure how wl-copy
could be related to ble.sh. ble.sh doesn't call wl-copy
, so the wl-copy
process should have been started through some hooks or settings or maybe manually executed.
edit2: Your first report on ps uxf
also contains wl-copy
, which is consistent. However, @10b14224cc's result https://github.com/akinomyoga/ble.sh/issues/402#issuecomment-1926302385 doesn't contain wl-copy
, so I think the cause can be different. Maybe another utility is remaining in @10b14224cc's environment.
This may be because of this Kitty option
Thanks for the information. Yeah, that option should be related. However, the behavior with close_on_child_death no
(i.e., the terminal doesn't close, even if the child dies, until all the TTY are closed) is a typical one in terminals, so it's not specific to kitty.
Ah, I see. Thanks very much for this information!
Edit: wl-copy
is used by the wl-clipboard
package to copy text to the system clipboard, used by things like Neovim
The author of wl-copy
seems to refuse to close the TTY. The author's thought is explained in https://github.com/bugaevc/wl-clipboard/pull/110#issuecomment-785153532. That explanation contains the following phrase:
including after forking into background.
This means that wl-copy
has a background process, which partly answers my question. Then, the author tries to keep stderr
of the background process connected to the TTY for error logging, but it seems a bit strange to use the TTY for error logging of a background process. A background process should save its error, if any, to a log file instead of the TTY where another application might be in the foreground.
Edit:
wl-copy
is used by thewl-clipboard
package to copy text to the system clipboard, used by things like Neovim
Ah. I see. So it's not necessarily used as a command-line tool but can be a background worker for another program.
I think the culprit in @10b14224cc's environment is hyprpaper
. First, there are only two processes that started after the hanging session starts (21265) and before running ps uxf
(24438):
[murase@letsnote2019 0 a]$ awk '21265 <= $2 && $2 < 24438' c.txt
USERNAME 23271 0.1 0.2 123496 42436 ? Sl 07:16 0:00 hyprpaper
USERNAME 23814 16.8 0.2 39400 35780 pts/2 Ss 07:17 0:03 \_ /usr/bin/bash --posix
The Bash process on the second line of the output is another tab where ps uxf
was run. Then, the only process is hyprpaper
.
Second, hyprpaper
is also a Wayland application similar to wl-copy
. They probably lose their controlling terminal on wl_display_connect
. Both programs call this function to initialize its Wayland session.
Third, hyprpaper
has so-called IPC mode (which I guess stands for the interprocess communication mode?), which is enough to suspect that it launches a background worker. In fact, there seems to be a process of hyprpaper
running in background in @10b14224cc's ps uxf
result.
Also, I think now I can explain why the problem comes out when ble.sh
is used. ble.sh
internally backs up the file descriptor of the TTY to different numbers, so even if the standard streams are closed by the background process, the backed-up descriptors remain. What is needed is to specify O_CLOEXEC
to the backed-up file descriptors so that they are automatically closed in the child processes on their startup, but there is no obvious way to specify O_CLOEXEC
at the shell script level.
I discovered that it is possible to add O_CLOEXEC
to the specified file descriptor in pure Bash. It hacks the file descriptor that Bash internally uses to back up the original file descriptor for redirection. I implemented it in commit 785267e18c81ada61fb26a346339613a8a8b04e6. The trick only works in Bash >= 4.0, but I think it should be fine for most users. Now I think the hanging tab would happen less.
@10b14224cc @ribru17 Could you update ble.sh
by running ble-update
and use it for a while to see if the situation changes?
Just updated, I will let you know if this fixes it. Thank you!!!
Thanks!
I believe this has fixed the issue for me :rocket:
Great! Thank you for the testing and confirmation!
The issue is here once again.
Please re-open.
@10b14224cc
$ ble/widget/display-shell-version
To confirm whether there are remaining processes, after the hanging happens, you can check the process list (e.g. by running
ps uxf
) in another kitty tab. It would be useful to record the tty name before exiting (or you may log the tty name in the EXIT hook). For example,# blerc blehook EXIT+='{ date; echo "pid $$"; tty; } >> ~/debug.txt'
then after the hanging happens, while keeping the hanging window, you can run the following command in another terminal window or tab:
$ date $ tail -n 3 ~/debug.txt $ ps uxf
If you could manage to do the above, could you give me the output?
In addition, if the hanging kitty tab would be spontaneously closed after waiting for some time, could you run ps uxf
again? If possible, I'd like to compare the results of ps uxf
before and after the kitty tab closes.
Q1: What is the current version of ble.sh and Bash in your environment? Could you copy and paste the result of the following command?
❯ ble/widget/display-shell-version
GNU bash, version 5.2.26(1)-release (x86_64-pc-linux-gnu) [Arch Linux]
ble.sh, version 0.4.0-devel4+b8d63b7e (noarch) [git 2.45.1, GNU Make 4.4.1, GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)]
starship, version 1.19.0 (rustc 1.78.0 (9b00956e5 2024-04-29) (Arch Linux rust 1:1.78.0-1), 2024-05-15 19:07:12 +00:00)
zoxide, version 0.9.4 (/usr/bin/zoxide)
atuin, version 18.2.0 (/usr/bin/atuin)
locale: LANG=en_US.UTF-8 LC_TIME=en_DK.UTF-8
terminal: TERM=xterm-kitty wcwidth=15.0-west/15.1-2+ri, kitty:0 (1;4000;35)
Q2: Can retry the following setup [ which I asked before in https://github.com/akinomyoga/ble.sh/issues/402#issuecomment-1926017866 ] again?
ok will post when it happens again
ble version: Bash version:
When I type
exit
and press Enter, sometimes (not always) ble.sh just hangs here:It displays this and then nothing happens.
The kitty tab is not closed.