facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
22.77k stars 2.03k forks source link

tests/cli-tests/cltools/zstdless.sh fails with newer version of less #4057

Open nc7s opened 1 month ago

nc7s commented 1 month ago

Describe the bug

Specifically, starting with git tag v611, or according to git bisect, commit b17b0802.

The result is as follows:

$ZSTD_SYMLINK_DIR='/home/runner/work/zstd-zstdless-test/zstd-zstdless-test/zstd/tests/cli-tests/bin/symlinks'
$ZSTD_REPO_DIR='/home/runner/work/zstd-zstdless-test/zstd-zstdless-test/zstd'
$DATAGEN_BIN='/home/runner/work/zstd-zstdless-test/zstd-zstdless-test/zstd/tests/datagen'
$ZSTDGREP_BIN='/home/runner/work/zstd-zstdless-test/zstd-zstdless-test/zstd/programs/zstdgrep'
$ZSTDLESS_BIN='/home/runner/work/zstd-zstdless-test/zstd-zstdless-test/zstd/programs/zstdless'
$COMMON='/home/runner/work/zstd-zstdless-test/zstd-zstdless-test/zstd/tests/cli-tests/common'
$PATH='/home/runner/work/zstd-zstdless-test/zstd-zstdless-test/zstd/tests/cli-tests/bin:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'
$LC_ALL='C'
FAIL: cltools/zstdless.sh
FAIL: cltools/zstdless.sh.check_exit
Exit code mismatch! Expected 0 but got 1
FAIL: cltools/zstdless.sh.check_stderr
stderr does not match!
> diff expected actual
1,2d0
< zstd: can't stat bad.zst : No such file or directory -- ignored 
< bad.zst: No such file or directory

FAIL: cltools/zstdless.sh.check_stdout
stdout does not match!
> diff expected actual
---
< + pass parameters
< 1234
< + bad path

----------------------------------------
FAIL: cltools/zstdless.sh
FAILED 1 / 1 tests!

The problem is that the first invocation of zstdless exited with 1.

With strace and a bit of manual debugging (aka inserting printf all over the place), the code doing the exit is located here, reproduced below:

        if (!ignore_eoi)
        {
            if (n == 0)
                consecutive_nulls++;
            else
                consecutive_nulls = 0;
            if (consecutive_nulls > 20)
                quit(QUIT_ERROR);
        }

To Reproduce

Steps to reproduce the behavior:

  1. Get a version of less newer than v610 (i.e. v611 and later) or aforementioned commit
  2. Replace exec less in programs/zstdless with exec $NEWER_LESS
  3. Run tests/cli-tests/run.py --verbose cltools/zstdless.sh

There is also an actions run here for your convenience.

Expected behavior

The test should succeed.

Cyan4973 commented 1 month ago

Thanks for the detailed report.

Reading the less code:

#if 1
    /*
     * This is a kludge to workaround a problem on some systems
     * where terminating a remote tty connection causes read() to
     * start returning 0 forever, instead of -1.
     */
(...) (code that exits) (...)
#endif

I have to wonder: is it a less problem ? Does the change of behavior affect other programs ?

nc7s commented 1 month ago

That "kludge" dates back to 2007, so it's probably unlikely to be (by itself) the culprit.

Unsure how much programmatical use less (and zstdless) gets... to have a significant impact on the programs that use it, if any. I don't even know how to call this "behavior", and outside of issues of less itself, it's rather hard to search for such reports (less is too common a word).

Do you think this issue should be upstreamed to less?

Cyan4973 commented 1 month ago

The situation is not clear, so it's difficult to attribute a clear root cause at this point.

What seems clear from your investigation is that something changed in less v611, dating Nov 2022, and that change started to break the test's expectation.

Since everything is possible at this stage, it may be that the test was doing something silly that was working by accident. But in such circumstances, the onus should rather be on the part that changed and broke the (so far working) test. Then, an investigation might end up justifying that the issue is on the user side, but it feels logical to start the investigation where the change happened.