att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
557 stars 152 forks source link

"read -v" truncates value to width of 80 character(bytes) punching card style #1294

Open sneyx123 opened 5 years ago

sneyx123 commented 5 years ago

Description of problem: Information lost without warning.

Ksh version:

# print -v .sh.version
Version AJM 93u+ 2012-08-01

How reproducible: Interactive from controlling terminal.

Steps to reproduce:

1. # typeset -Z90 W="123"
2. # X="${W}"
3. # print -vn X|xxd
    00000000: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000010: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000020: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000030: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000040: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000050: 3030 3030 3030 3031 3233                 0000000123
4. # print "${#X}"
    90
5. # read -v "X?X:"
X:00000000000000000000000000000000000000000000000000000000000000000000000000000000
6. NOTE: Careful understand what is "input" and "output" for step 5 and 6.
    After hitting <enter> for the command of step 5 the cursor is placed right after a bunch of
    zeros ... now to edit just hit the "E" key ... then go to the begin of the line (CTRL-A 
    for emacs mode) ...
    cursor is now placed left to the bunch of zeros ... now just hit "A" key and hit <enter> to
    complete the read command. 
    (Edit session is: from: "000...000" => to: "A000...000E"
    # read -v "X?X:" 
X:A00000000000000000000000000000000000000000000000000000000000000000000000000000000E
7. # print -vn X|xxd
    00000000: 4130 3030 3030 3030 3030 3030 3030 3030  A000000000000000
    00000010: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000020: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000030: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000040: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
    00000050: 3045                                     0E
8. # print "${#X}"
    82

Actual results: Length of value of variable X is 82

Expected results: Length of 92

Additional info:

# env | grep TERM=
TERM=vt100-am
# env | grep COLUMNS=
COLUMNS=211
# tput cols
211
# infocmp | grep cols
        cols#80, it#8, lines#24, vt#3,
krader1961 commented 5 years ago

What is your locale? That is the output of locale?

I'm not going to investigate this because it involves print -v which, as previously discussed, involves outputting the binary value of the var. Which is only meaningful for scalar strkings, ints and floats. For anything else it emits a symbolic representation of the var. And that representation is undefined by the documentation.

@sneyx123 You still have not explained why you expect print -v to emit anything other than machine specific output given the documentation of this feature as being equivalent to printf '%B'. And that means the current output for anything other than a scalar var is acceptable.

sneyx123 commented 5 years ago
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
# env|egrep "(LANG|LC_)"
LANG=en_US.UTF-8
LC_ALL=
LC_COLLATE=
LC_CTYPE=
LC_MESSAGES=
LC_MONETARY=
LC_NUMERIC=
LC_TIME=
sneyx123 commented 5 years ago

@krader1961 In the case the "print -v" shall remain in the machine specific output form for ints and floats: then why is the index array, associative array, compound, subvariable and namespace not also in the machine specific form? Are these covered in the documentation? In the case the interface changes: a new documentation and a new specification for a standard is required. How have David K. promoted the transition form ksh88 to ksh93 ?

krader1961 commented 5 years ago

then why is the index array, associative array, compound, subvariable and namespace not also in the machine specific form?

Because there is no "machine specific form" for those data types; at least not in the sense you are using that term. A compound var is not like a C struct. Heck, even writing a binary representation of a C struct may not make any sense. Consider this example:

struct {
    int i;
    char *s;
} x;
x.i = 123;
x.s = "hello";
write(1, &x, sizeof(x));

That is legal and has a well defined meaning. However, it means the address of the string "hello" is written to stdout after the binary representation of integer123. And that address has no meaning outside of that process.

Please open a new issue if you want to propose that print -v output a symbolic rather than binary representation of simple scalar vars. But note that is a backward incompatible change and thus is unlikely to be implemented.

sneyx123 commented 5 years ago

You like to discuss in which ISO layer the shell has to be assigned? The question is if it below layer 6 or above? In layer 6 there exists a "abstract syntax", "transfer syntax" and a "concrete syntax" ... your C code example is more or less the "concrete syntax". In a other model the kernel is in the center -- surrounded by the shell(s) -- and the shell is surrounded by the apps. If the shell has such a machine, processor, vendor or compiler specific form forget any interoperability. AT&T want to process international bills related to communication crossing borders? They go out of business with such C structs ...

# cat t.c

#include <stdio.h>

#include <sys/types.h>
#include <netinet/in.h>
#include <inttypes.h>

#include <unistd.h>
#include <strings.h>

struct {
        int             i;
        char           *s;
}               x;

struct {
        long int        i;
        char            s[80];
}               y;

char            marker[6] = "marker";

int
main()
{

        x.i = 123;
        x.s = "hello";
        write(1, &x, sizeof(x));

        write(1, marker, sizeof(marker));

        y.i = htonl(123);
        strcpy(y.s, "hello");
        write(1, &y, sizeof(y));

        return (0);
}
# ./t|xxd
00000000: 7b00 0000 e008 0508 6d61 726b 6572 0000  {.......marker..
00000010: 007b 6865 6c6c 6f00 0000 0000 0000 0000  .{hello.........
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000                                     ..
krader1961 commented 5 years ago

The truncation is due to this symbol, #define LOOKAHEAD 80, defined in src/cmd/ksh93/include/edit.h. Which causes the default value to be truncated to 80 chars by this block of code:

https://github.com/att/ast/blob/1cf4ed06b5cfa06d71577799acdb0d966595a0b4/src/cmd/ksh93/edit/edit.c#L476-L482

The intent is to ensure the default value doesn't overflow the fixed size buffer defined here in struct edit:

https://github.com/att/ast/blob/1cf4ed06b5cfa06d71577799acdb0d966595a0b4/src/cmd/ksh93/include/edit.h#L101

Obviously we can increase the size of this buffer. But any practical limit could still cause truncation. Let's say we change LOOKAHEAD from 80 to 100. That fixes the specific scenario in the original problem statement but obviously will still truncate default values greater than 100 chars in length. So this isn't a bug in the usual sense in that the code is working as intended. It is, however, undocumented behavior. Theoretically the lookahead buffer could be dynamically sized to avoid truncation for any non-pathological scenario. Whether the added code complexity of doing so can be justified is questionable. The simplest solution is to simply increase the limit to something more reflective of today's computing environments (e.g., 250 or 500) and document the limit in the read command.

sneyx123 commented 5 years ago

Please keep in mind that if "CTRL-P" (in emacs mode -- which is same as "CURSOR-UP" -- in documentation it is called "^P" ) is hit several times before you hit the final "ENTER" you dive into your shell history -- which can potential contain multiline lines and line continuation lines (literal "\n\") that can easy bring you beyond 500 -- so please unlimited.

krader1961 commented 5 years ago

@sneyx123 Did you test the [ctrl-P] scenario? I did and it does not truncate the history text. Why that behaves differently from read with a default var is unknown at this time. Patches to make the two cases behave the same are welcomed. But this is a pretty low priority issue in my opinion. Especially since the trivial "fix" of increasing the size of the lookahead buffer is probably good enough.

sneyx1234 commented 5 years ago

The truncation must take place when the value of the shell-variable is copied into the edit buffer. In case of a history selection via cursor-keys and its edition it will of course not be truncated because then the hist facility would be useless. Isn't the read builtin also used in the "main event loop" to interact with the user?