gnunn1 / tilix

A tiling terminal emulator for Linux using GTK+ 3
https://gnunn1.github.io/tilix-web
Mozilla Public License 2.0
5.41k stars 293 forks source link

TTY always created with size 24 x 80 and then resized #1777

Open romkatv opened 5 years ago

romkatv commented 5 years ago

When a new terminal tab is created by clicking "Add terminal right" or "Create new session", it often reports incorrect TTY size of 24 x 80. When this happens, SIGWINCH is delivered a few milliseconds later. After SIGWINCH the TTY size is reported correctly.

I'm writing a piece of code that prints formatted text to the TTY during shell initialization. To print it well, I need to know the terminal width. The problem is that I cannot figure out how to get it. If the TTY size is reported as 24 x 80, should I wait for SIGWINCH or not? Most of the time 24 x 80 is fake data but sometimes it's real (people do have terminals of this size). When it's real, no SIGWINCH will come, so I shouldn't wait for it. If it's fake, I must not print anything until I find out the terminal size, so I must wait for SIGWINCH.

It would be perfect if the TTY was always created with the right size to begin with. I will also be happy with a workaround if you can offer one.

$ cat ~/.zshenv
unsetopt globalrcs

$ cat ~/.zshrc
trap 'echo WINCH' WINCH
stty size
PROMPT='> '
RPROMPT='< '
$ uname -isrvmop
Linux 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ tilix --version
Versions
    Tilix version: 1.7.7
    VTE version: 0.52
    GTK Version: 3.22.30

Tilix Special Features
    Notifications enabled=0
    Triggers enabled=0
    Badges enabled=0

I'm querying TTY size through builtin COLUMNS and LINES parameters in zsh. However, stty size also sometimes reports 24 x 80, although it happens rarely simply because it takes a long time to execute. Even running /bin/true is often enough of a delay to get correct TTY size.

egmontkob commented 5 years ago

There's a brand new relevant VTE bugreport at https://gitlab.gnome.org/GNOME/vte/issues/188. I'm wondering if 80x24 is perhaps Tilix's workaround against 0x0...?

romkatv commented 5 years ago

There's a brand new relevant VTE bugreport at https://gitlab.gnome.org/GNOME/vte/issues/188.

Thanks for the link!

If I were getting 0x0, that would be fantastic! I would know that I have to wait for the real data to come through. Either with busy waiting or by listening to SIGWINCH.

I'm wondering if 80x24 is perhaps Tilix's workaround against 0x0...?

As I mentioned above, /bin/stty size also gives the fake data of 24x80. Can Tilix really do this?

Is it possibly that some kernel versions give 0x0 while others give the fake 24x80? I've only tested it on 4.15.0-65, which is pretty old.

egmontkob commented 5 years ago

Can Tilix really do this?

I honestly don't know if it can somehow override the temporary default.

Is it possibly that some kernel versions give 0x0 while others give the fake 24x80?

It's also possible, but I wouldn't bet on this, I'd guess the kernel never had a hardwired 24x80.

It could also be that we're facing a slightly different issue. It's all about to be investigated.

romkatv commented 5 years ago

I should've mentioned that this affects not just my personal code. It breaks zsh prompt if it's printed fast enough. To reproduce:

  1. Ensure that Tilix starts zsh when creating a terminal. For example, make zsh your login shell and use Tilix with default options.
  2. Add this at the very top of your ~/.zshrc.
    typeset COLUMNS LINES
    PROMPT='left> '
    RPROMPT='<right'
    return
  3. Start tilix.
  4. Click "Add terminal right".

Here's what I've got:

tilix-split

The left window is the original. Notice that it reports correct dimensions (10 x 90 is what I set as the default size in Tilix preferences). The right window reports incorrect size of 24 x 80. It also has broken prompt. That inverted % sign is the result of prompt_sp option. It's visible because zsh thinks the terminal is 80 columns wide while it is in fact only 45 columns wide. You can also see the remains of the first rendered prompt, which wasn't completely erased when zsh received SIGWINCH. This is also because zsh was given wrong terminal width when it started.

egmontkob commented 5 years ago

I've seen the zsh % bugreport here and there, and – incorrectly, my bad – assumed it was a zsh setup issue and didn't investigate further. Good to know what it's actually about!

romkatv commented 5 years ago

Here's a bit of extra information about that %.

When options prompt_cr and prompt_sp are enabled (this is the default), zsh prints the following sequence before every prompt.

  1. Bold % on inverted background. This is controlled by PROMPT_EOL_MARK. You can put PROMPT_EOL_MARK=X in ~/.zshrc to use the plain X character instead.
  2. N - M spaces, where N is the terminal width and M is the width of PROMPT_EOL_MARK (1 by default).
  3. \r (carriage return).
  4. Clear to the end of line (ce termcap).

If this sequence is printed when the cursor is at the very first column, it will simply clear the current line. If the cursor is not on the first column, it'll append % to the current line and move the cursor to the beginning of the next line. Here's how it looks in PROMPT='> ' zsh -f:

zsh-prompt_sp

Now let's see what happens when we create a new terminal in Tilix that has real width of 45 but incorrectly reports it as 80. Instead of printing % followed by 44 spaces, \r and ce (which would have no effect), zsh will print % and 79 spaces. This will wrap around to the next line. The subsequent \r and ce will clear the second line but % on the first line will remain.

romkatv commented 5 years ago

I was wrong when I stated that whenever the initial size is fake there will always be SIGWINCH afterwards. This is true when clicking "Add terminal window right" but false when clicking "Create new session". To test this, I've create ~/.zshrc with the following content:

setopt trapsasync
trap 'echo WINCH ${LINES}x${COLUMNS}' WINCH
echo BEGIN ${LINES}x${COLUMNS}
sleep 1
echo END ${LINES}x${COLUMNS}

With default terminal size set to 80x24 in Preferences:

With default terminal size set to 90x10 in Preferences:

The very last output looks like an independent bug.

Based on these test cases I've written the following workaround for zsh. When executed upon zsh startup, it will block until the correct terminal size is determined.

() {
  emulate -L zsh
  if [[ $LINES == (24|0) && $COLUMNS == (80|0) ]]; then
    setopt monitor trapsasync
    zmodload zsh/datetime
    zmodload zsh/system
    local -F deadline=$((EPOCHREALTIME+0.03))
    local -i fd pid
    trap 'kill -- -$pid' WINCH
    exec {fd}< <(
      echo $sysparams[pid]
      while [[ $LINES == (24|0) && $COLUMNS == (80|0) && $EPOCHREALTIME -lt $deadline &&
              "$(/bin/stty size 2>/dev/null)" == (24|0)\ (80|0) ]]; do
      done)
    IFS= read -u $fd pid
    ( read -u $fd )
    exec {fd}>&-
    trap - WINCH
  fi
}

The biggest problem is that it blocks for 30 milliseconds (configurable timeout) when the terminal is in fact 24x80. In all other cases it's faster.

egmontkob commented 5 years ago

Could you please test the patch from the VTE bug?

romkatv commented 5 years ago

Could you please test the patch from the VTE bug?

I've never built VTE (I only learned about its existence today), so I went to https://gitlab.gnome.org/GNOME/vte and started following the instructions.

This is where I stopped. How deep is the rabbit hole?

If you are able to build VTE with your patch, perhaps you can check whether it fixes Tilix? It might be easier that way.

  1. Ensure that Tilix starts zsh when creating a terminal. For example, make zsh your login shell and use Tilix with default options.
  2. Set the default terminal size in Tilix to something 10x90 (anything else will do as long as it's not 24x80).
  3. Add this at the very top of your ~/.zshrc.
    echo ${LINES}x${COLUMNS}
    return
  4. Start tilix. Verify that it prints 10x90.
  5. Click "Add terminal right". Verify that it prints 10x90.
  6. Click "Create new session". Verify that it prints 10x90.
egmontkob commented 5 years ago

The rabbit hole is not deep at all. Just get new meson (e.g. from a newer version of your distro, or manually), or VTE version 0.56 which uses the autotools build system.

I tried with tilix but I couldn't reproduce the problem (the one that I posted in the xfce bugtracker – not the zsh one). I could only reproduce that problem with xfce4-terminal and gnome-terminal, and the patch fixes them for me.

bilelmoussaoui commented 5 years ago

Meson is python only. You can update to the latest release using pip3 install meson --upgrade

romkatv commented 5 years ago

VTE version 0.56 which uses the autotools build system

Went through the whack-a-mole of errors for some time and gave up on this:


No package 'pango' found
No package 'gtk+-3.0' found
No package 'libpcre2-8' found
No package 'gnutls' found
``
romkatv commented 5 years ago

Sorry, closed prematurely. I managed to built vte after the usual whack-a-mole with errors. Here's what worked:

sudo apt install python3-pip meson gtk-doc-tools intltool libgtk-3-dev libfribidi-dev libpcre2-dev libpango1.0-dev libgnutls28-dev
pip3 install meson --upgrade
git clone https://gitlab.gnome.org/GNOME/vte.git/
cd vte
git checkout vte-0-56
curl -fsSLO https://gitlab.gnome.org/GNOME/vte/uploads/52f7c8febfc94b3e8e2c3e7a7a7d2abf/vte-188-initial-size.patch
git apply vte-188-initial-size.patch
./autogen.sh --disable-introspection --disable-vala
make -j 20
sudo cp src/.libs/libvte-2.91.so.0.5600.5 /usr/lib/x86_64-linux-gnu/ 
sudo mv /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0 /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0.bak
sudo ln -s libvte-2.91.so.0.5600.5 /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0

My original libvte-2.91.so.0 was pointing to libvte-2.91.so.0.5200.2.

I then restarted my machine for good measure and verified that echo $VTE_VERSION prints 5605 in GNOME Terminal and in Tilix, which seems like a good sign. However, zsh didn't get fixed in Tilix. It still incorrectly gets 24x80.

Then I reverted the patch and rebuilt vte to check that the patch actually makes a difference. It's a good thing that I did, because there is no effect on libvte-2.91.so.0.5600.5 -- the only file that I used from the build artifacts. There are no other so files, so I'm lost.

egmontkob commented 5 years ago

However, zsh didn't get fixed in Tilix. It still incorrectly gets 24x80.

That sounds bad.

there is no effect on libvte-2.91.so.0.5600.5

Do you mean that the exact same binary (same md5sum) is produced, no matter if you apply the patch or not? I seriously doubt that (and just verified with the vte-0-56 branch that the generated file becomes different for me).

romkatv commented 5 years ago

However, zsh didn't get fixed in Tilix. It still incorrectly gets 24x80.

That sounds bad.

This is explained by the fact that the binary I built is the same with the patch as without.

there is no effect on libvte-2.91.so.0.5600.5

Do you mean that the exact same binary (same md5sum) is produced, no matter if you apply the patch or not?

Correct. diff says the files are identical.

I seriously doubt that (and just verified with the vte-0-56 branch that the generated file becomes different for me).

Thanks for confirming that your patch is expected to have an effect on these files and that it's indeed these files that I need to install to fix Tilix. I was uncertain about both of these points.

I'll get back to this later today or tomorrow and will once again build vte with and without the patch. I must have messed something up in my first attempt.

romkatv commented 5 years ago

I must have messed something up in my first attempt.

Indeed I have. Here's what I did now:

Built two versions of vte: regular and patched.

function build-vte() (
  set -e
  git clone https://gitlab.gnome.org/GNOME/vte.git/ vte"$1"
  cd vte"$1"
  git checkout vte-0-56
  if [[ -n "$1" ]]; then
    curl -fsSLO 'https://gitlab.gnome.org/GNOME/vte/uploads/52f7c8febfc94b3e8e2c3e7a7a7d2abf/vte-188-initial-size.patch'
    git apply vte-188-initial-size.patch
  fi
  ./autogen.sh --disable-introspection --disable-vala
  make -j 20
)

cd "$(mktemp -d)"
build-vte
build-vte -patched

Installed the patched libvte-2.91.so.0.5600.5.

sudo cp vte-patched/src/.libs/libvte-2.91.so.0.5600.5 /usr/lib/x86_64-linux-gnu/ 
sudo mv /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0{,.bak}
sudo ln -s libvte-2.91.so.0.5600.5 /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0

Verified that the patched version is installed:

$ ls -l /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0                                
lrwxrwxrwx 1 root root 23 Oct 21 14:45 /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0 -> libvte-2.91.so.0.5600.5

$ diff -s {vte-patched/src/.libs,/usr/lib/x86_64-linux-gnu}/libvte-2.91.so.0.5600.5 /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0.5600.5
Files vte-patched/src/.libs/libvte-2.91.so.0.5600.5 and /usr/lib/x86_64-linux-gnu/libvte-2.91.so.0.5600.5 are identical

Verified that the patched and the regular versions are different.

$ diff -s vte{,-patched}/src/vtegtk.cc
2443a2444,2445
>         vte_pty_set_size(pty, IMPL(terminal)->m_column_count, IMPL(terminal)->m_row_count, NULL);
> 

$ diff -s vte{,-patched}/src/.libs/libvte-2.91.so.0.5600.5
Binary files vte/src/.libs/libvte-2.91.so.0.5600.5 and vte-patched/src/.libs/libvte-2.91.so.0.5600.5 differ

Rebooted.

Populated my ~/.zshrc with the following content:

setopt trapsasync
trap 'echo WINCH ${LINES}x${COLUMNS}' WINCH
echo BEGIN ${LINES}x${COLUMNS}
sleep 1
echo END ${LINES}x${COLUMNS}

Launched tilix. It opened a 24 x 80 terminal as expected (this is my current default). I then clicked "Add terminal right". Got this:

BEGIN 24x80
WINCH 24x39
END 24x39

This exhibits the same bug as before: the initial terminal size is reported as 24x80 even though it's 24x39. Soon after the terminal is created, zsh receives WINCH. After that the size is reported correctly. This is the same behavior as I observed without the patch.

Did I do anything wrong? Should I try something else? Do you need any information from me?

P.S.

Just in case this matters.

$ ldd =tilix
    linux-vdso.so.1 (0x00007ffe6a98c000)
    libvted-3.so.0 => /usr/lib/x86_64-linux-gnu/libvted-3.so.0 (0x00007f8f7fbbd000)
    libgtkd-3.so.0 => /usr/lib/x86_64-linux-gnu/libgtkd-3.so.0 (0x00007f8f7eab0000)
    libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f8f7e778000)
    libphobos2-ldc-shared.so.78 => /usr/lib/x86_64-linux-gnu/libphobos2-ldc-shared.so.78 (0x00007f8f7e13a000)
    libdruntime-ldc-shared.so.78 => /usr/lib/x86_64-linux-gnu/libdruntime-ldc-shared.so.78 (0x00007f8f7de3a000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f8f7dc32000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8f7da13000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8f7d622000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8f80313000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f8f7d41e000)
    libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f8f7d1f6000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8f7ce58000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f8f7cc3b000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8f7ca23000)
    libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f8f7c81f000)
    libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f8f7c619000)
    libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f8f7c404000)

Tilix doesn't have a dependency on libvte but does have a dependency on libvted. Is this expected? Should it still be affected by the patched libvte?

egmontkob commented 5 years ago

So, Tilix indeed doesn't work correctly for you. However, it's unclear to me if it's VTE's fault or Tilix's, we could be seeing a different bug.

Did I do anything wrong? Should I try something else? Do you need any information from me?

You did everything great. Let's do the following, both to double check that you're indeed using a patched VTE, and to gather further information:

Place this line right after (or before) the one added by the patch:

fprintf(stderr, "tilix-1777 debug %ld %ld\n", IMPL(terminal)->m_column_count, IMPL(terminal)->m_row_count);

These will obviously be printed in the terminal where you start Tilix from (presumably a different terminal emulator).

Oh, maybe you don't even have to do it. I've tried it and I get 80x24 all the time, even when splitting. This is bad.

These numbers come from VTE itself, near the top of src/vtedefines.hh. I mean, change them there, compile+install VTE, and Tilix will give you these numbers.

You might want to repeat these steps to confirm that you get the same results as me, but it's not necessary, as you wish.

PS. Don't waste your time rebooting. Just make sure to quit all Tilix instances at once, in order to pick up the freshly installed VTE.

Note for myself: The next for me to try is to place the D equivalent of while (gtk_events_pending()) gtk_main_iteration(); in front of the spawn_sync command in Tilix. I'll get back to it later.

romkatv commented 5 years ago

These will obviously be printed in the terminal where you start Tilix from (presumably a different terminal emulator).

Confirmed.

These numbers come from VTE itself, near the top of src/vtedefines.hh. I mean, change them there, compile+install VTE, and Tilix will give you these numbers.

Also confirmed.

I'll get back to it later.

Thanks for looking into this!

romkatv commented 5 years ago

Seems like there is another bug in Tilix or underlying infrastructure. From the Tilix user's point of view, there are 3 bugs related to the terminal size reporting. They may or may not have a single culprit.

Bug 1

When creating a new terminal of size 40x100 (or any other size as long as it's not 24x80), /bin/stty size may print 24 80. Expected: /bin/stty size prints 40 100.

(But we can just wait for SIGWINCH that will arrive a few milliseconds after the terminal is created and get the correct terminal size, right? Unfortunately, no, because there is another bug.)

But 2

When creating a new terminal of size 40x100 (or any other size as long as it's not 24x80), /bin/stty size may print 24 80 on the first N calls and 40 100 on all subsequent calls (this is bug 1) but SIGWINCH may never arrive (this is bug 2). Expected: SIGWINCH arrives. In general, the expected behavior is that every change in the output of /bin/stty size causes SIGWINCH.

(But if we do receive SIGWINCH, /bin/stty size will give us the correct terminal size, right? No, there is another bug.)

Bug 3

When creating a new terminal of size 40x100 (or any other size as long as it's not 24x80), /bin/stty size may print 24 80 when called after receiving SIGWINCH. Expected: /bin/stty size prints 40 100 when called after receiving SIGWINCH.

(To reproduce this bug it may be necessary to create very high number of terminals quickly.)

The practical outcome of these bugs is that SIGWINCH is meaningless when one's goal is get the correct terminal size on shell startup. It won't arrive if the real terminal size is 24x80. It may not arrive if the real terminal size is not 24x80. When it arrives, /bin/stty size may still print 24 80 despite the real size being different.

The only way to get the correct terminal size appears to be this:

  1. Call /bin/stty size in a loop.
  2. If it prints anything other than 24 80, this is the real terminal size.
  3. If the loop takes longer than T (arbitrary timeout value), abort. The real terminal size is unknown.

The presence of step (3) implies that we cannot always get the correct terminal size, no matter what we do and how long we wait. Without this step the loop will never terminate when the real terminal size is 24 80, so it's necessary.

If /bin/stty size gave an indication that it doesn't know the real terminal size (by printing 0 0, or by any other means) instead of producing fake output, it would be possible to reliably detect terminal size. As far as the problem of determining the correct terminal size is concerned, the choice of 24 80 as fake output is the worst possible choice because this is the most common real terminal size in Tilix (I suppose; at least it's the default terminal size). I realize that 24 80 was chosen precisely for these reasons (being the most likely correct terminal size), as it was meant to mitigate the damage of bug 1. Unfortunately, it also made it impossible for code that is aware of bug 1 to employ a robust workaround.

egmontkob commented 5 years ago

Thanks for your input!

I suspect that there are two race condition we're hunting, one (between the parent and child after forking, nothing to do with GTK) is already fixed in the VTE bug, and the other one (around the GTK event loop) is yet to be fixed. I'd really like to get to the bottom of these.

I suspect that your bugs 2 & 3 are bash+zsh bugs (limitations) and not VTE/Tilix ones. Try this:

trap 'stty size; sleep 5' WINCH

You'll notice that during those 5 seconds, subsequently arriving WINCHes (while the handler of a previous one is running) are swallowed by bash and zsh, that is, you can't reliably handle quickly arriving WINCHes in shells. setopt trapsasync doesn't help. Other software, such as vim are also known to have this bug. Yet others, e.g. mc (especially after some recent pre-4.8.24 fixes) always resize to the final expected size. We've debugged this area quite extensively, and I'm almost certain that VTE behaves correctly here, triggering a WINCH whenever necessary. There are newly discovered problems during its startup, but I don't think there are any problems later on. Anyway, if it's suspected that there still are problems here in VTE, they should be debugged using a tiny WINCH-handler C code rather than a bloated shell.

The fish shell seems to do it correctly, could you test the behavior with that? (In my opinion it launches the handler too many times, once for every signal received, rather than combining multiple signals received while running the previous signal handler into one. But this should be harmless, it just waits too long if you combine with sleep in the handler for debugging purposes.)

I've written the following workaround for zsh [followed by 20 lines of code]

Just my 2 cents: It looks overengineering for me, at least for such a workaround that shouldn't be necessary once we track down and fix the VTE problem. I'd personally just go with a sleep 0.01 or so and not waste more effort on this.

egmontkob commented 5 years ago

To amend my previous post:

bash and zsh seem to correctly track multiple quickly arriving WINCHes for their own bookkeeping purposes (e.g. updating $COLUMNS and the line editing behavior). They just forget to re-execute the trap.

romkatv commented 5 years ago

You'll notice that during those 5 seconds, subsequently arriving WINCHes (while the handler of a previous one is running) are swallowed by bash and zsh, that is, you can't reliably handle quickly arriving WINCHes in shells.

I'm not sure I understand how this explains bug (2). In that bug I'm receiving zero WINCH signals when I would expect to receive one.

I'm also not sure how this explains bug (3). In that bug I'm calling /bin/stty size after receiving WINCH and am unexpectedly seeing 24 80 as its output. There is no resizing happening in this test case, so there must be only one WINCH. Are you suggesting that Tilix can send an extra WINCH when nothing changes? If yes, then I see how your explanation makes sense.

setopt trapsasync doesn't help.

I wasn't expecting it to. I set this option so that my signal handler gets called while the top-level zsh is waiting for the foreground child to terminate. This point is moot because I no longer handle WINCH in my workaround code.

Anyway, if it's suspected that there still are problems here in VTE, they should be debugged using a tiny WINCH-handler C code rather than a bloated shell.

That's a good idea. I wrote this:

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <unistd.h>

#define CHECK(...)        \
  if (!(__VA_ARGS__)) {   \
    perror(#__VA_ARGS__); \
    exit(1);              \
  } else (void)0

static void print_tty_size(const char* when) {
  struct winsize w;
  CHECK(ioctl(STDOUT_FILENO, TIOCGWINSZ, &w) != -1);
  CHECK(printf("%s: %dx%d\n", when, w.ws_row, w.ws_col) >= 0);
}

int main(void) {
  sigset_t sigset;
  siginfo_t siginfo;

  CHECK(sigemptyset(&sigset) == 0);
  CHECK(sigaddset(&sigset, SIGWINCH) == 0);
  CHECK(sigprocmask(SIG_BLOCK, &sigset, NULL) == 0);

  print_tty_size("Initial");
  CHECK(sigwaitinfo(&sigset, &siginfo) == SIGWINCH);
  print_tty_size("After WINCH");

  while (1) sleep(1);
}

I can confirm that bug (2) was indeed a bug in zsh as it never happens with this simple program.

I've written the following test case to demonstrate the bug in zsh. In some cases the following piece of code can print tty size changed and nothing else.

() {
  emulate -L zsh
  setopt trapsasync
  TRAPWINCH() { echo WINCH }
  local size=${LINES}x${COLUMNS}
  while /bin/true; do
    if [[ $size != ${LINES}x${COLUMNS} ]]; then
      echo 'tty size changed'
      size=${LINES}x${COLUMNS}
    fi
  done
}

I haven't found a way to reproduce it consistently but the following procedure will sometimes trigger the bug:

  1. Run the code above.
  2. Resize your terminal a little bit.
  3. If (un)lucky, you'll see tty size changed message but no WINCH. This is unexpected. The expected behavior is that either the output should be empty, or it should have WINCH in it.

I think, In theory, bug (3) must still trigger on my C program because I don't see how zsh can break that one. I cannot verify it though because I cannot trigger (3) on my machine even with zsh. My bug report for it was based on the results of testing done by my friend on a slower machine. It's possible that I messed something up but I'd rather not spend time investigating it (WINCH is useless to me either way).

Just my 2 cents: It looks overengineering for me, at least for such a workaround that shouldn't be necessary once we track down and fix the VTE problem. I'd personally just go with a sleep 0.01 or so and not waste more effort on this.

On a reasonable laptop with idle CPU the median time to acquire correct tty size is around 60ms, which means that sleep 0.01 is way too short. I would need sleep 0.1 or so to get decent success rate. This would add 100ms startup latency to zsh, which would defeat the whole purpose of what I'm trying to achieve. I want zsh prompt to appear immediately after I open a tab but I also don't want it to look broken. It works really great, except when using VTE, so I'm trying to do the best I can there.

Note that I wrote that workaround code before I realized that SIGWINCH is useless. My new workaround is much simpler.

() {
  emulate zsh
  if [[ -z $VTE_VERSION || $LINES != 24 || $COLUMNS != 80 ]]; then
    echo "Real terminal size: $LINES $COLUMNS"
    return
  fi
  zmodload zsh/datetime
  local -F deadline=$((EPOCHREALTIME+0.1))
  local size
  while (( EPOCHREALTIME < deadline )) && size="$(/bin/stty size 2>/dev/null)"; do
    if [[ $size != "24 80" ]]; then
      echo "Real terminal size: $size"
      return
    fi
  done
  echo "Unknown terminal size."
}

(Those echo calls are for exposition only.)

The complexity of this code doesn't bother me at all. What does bother me is that it's slow when the real terminal size is 24x80 (the loop terminates by timeout in this case). And when that happens, I have to do something custom because I don't know if the timeout was hit due to the machine being slow or the real terminal size being 24x80.

egmontkob commented 5 years ago

At this point I can't answer your questions. Please be patient until I find time to thoroughly investigate the case and come up with actual observations and answers, rather than wild assumptions. It might take a couple of days.

egmontkob commented 5 years ago

Could you please test this Tilix patch, together with an updated (v2) VTE patch from VTE issue 188?

tilix-1777-vte-188-delayed-resize-workaround.patch.txt

romkatv commented 5 years ago

Could you please test this Tilix patch, together with an updated (v2) VTE patch from VTE issue 188?

tilix-1777-vte-188-delayed-resize-workaround.patch.txt

I've never built Tilix. Given that it uses a built toolchain I'm unfamiliar with, it would require a non-trivial time commitment (it took me over an hour to test the VTE patch). Since you've implemented a path (thank you!), I suppose you are able to build Tilix on your machine, in which case it should be relatively straightforward for you to check whether the bug is fixed either by starting my C program (https://github.com/gnunn1/tilix/issues/1777#issuecomment-545885982) through Tilix, or by using zsh with typeset LINES COLUMNS at the top of ~/.zshrc.

egmontkob commented 5 years ago

I couldn't reproduce any of the zsh issues on my computer (with unpatched VTE and Tilix), I guess the timings are different. My own methods of debugging suggest that the bug is fixed in Tilix with these two patches, but it's always good to have some confirmation. Hopefully someone else will provide it, then. Thanks anyways for the efforts you put into chasing down this issue (in other terminals as well)!

romkatv commented 5 years ago

I couldn't reproduce any of the zsh issues on my computer (with unpatched VTE and Tilix)

I was under the impression that you've reproduced it, based on this comment of yours:

I've tried it and I get 80x24 all the time, even when splitting. This is bad.

I must have misunderstood what you meant there. Since I'm apparently the only one who can reproduce the bug I've reported, this changes things. I went ahead and attempted to verify your patch.

First, I built and installed VTE 0.56 with your patch (the same as before).

Then I built Tilix from HEAD (without your patch).

curl -fsSLo /tmp/dmd.deb http://downloads.dlang.org/releases/2.x/2.088.1/dmd_2.088.1-0_amd64.deb
sudo dpkg -i /tmp/dmd.deb
git clone https://github.com/gnunn1/tilix.git /tmp/tilix
cd /tmp/tilix
dub build --build=release
sudo ./install.sh /usr

(Side note: I sent https://github.com/gnunn1/tilix/pull/1783 to fix a bug in install.zsh.)

I verified that:

  1. The first terminal that gets created when starting Tilix reports correct TTY size.
  2. Clicking "Create new session" creates a terminal that reports incorrect TTY size (24x80).
  3. Clicking "Add terminal right" creates a terminal that reports incorrect TTY size (24x80).

Then I've applied your Tilix patch and rebuilt Tilix.

curl -fsSL 'https://github.com/gnunn1/tilix/files/3784639/tilix-1777-vte-188-delayed-resize-workaround.patch.txt' | git apply
dub build --build=release
sudo ./install.sh /usr

I verified that:

  1. as before (still correct)
  2. as before (still broken)
  3. fixed (was broken but now it's correct)

So the patch is an improvement but not a complete fix.

By the way, it was apparent even before your patch that Tilix does something different when creating a new terminal with "Create new session" vs "Add terminal right". If I busy-wait for correct terminal dimensions, it takes 10 times longer for the terminal created with "Create new session" (10ms vs 1ms).

romkatv commented 4 years ago

@egmontkob I've bumped into another bug. When clicking "New Tab" in GNOME Terminal, on rare occasions a newly created terminal can report transposed TTY dimensions for a short time.

Here's how I can reproduce this:

  1. Set /bin/stty size as the command that GNOME Terminal runs.
  2. Click "New Tab" in GNOME Terminal.

Expected: Prints "45 175" (the real dimensions of my GNOME Terminal). Actual (rarely happens): Prints "175 45".

I've tried replacing /bin/stty size with zsh -dfc 'typeset LINES COLUMNS' and with my own C program. The bug reproduces this way, too.

Is this a known issue?

egmontkob commented 4 years ago

transposed TTY dimensions

Yes I messed it up in the original patch in vte/188, please use the updated patch from there.

romkatv commented 4 years ago

Yes I messed it up in the original patch in vte/188, please use the updated patch from there.

Thanks for the quick response. The updated patch (v2) fixes the problem indeed.

egmontkob commented 4 years ago

So, regarding the pending issue ("create new session", bullet point 2 a few comments ago): I can confirm that this is still broken. In the mean time, there's still a bug with gnome-terminal, too.

The first ideas of the main VTE developer plus me didn't fix it. Seems we'll have to dig deeper to understand what is going on. I have no idea when we'll have time for this, so I can't promise anything.

romkatv commented 4 years ago

Understood. Thanks for the update. I appreciate your work on this tricky issue.