ThomasDickey / original-mawk

bug-reports for mawk (originally on GoogleCode)
http://invisible-island.net/mawk/mawk.html
17 stars 2 forks source link

getline waits for EOF if RS != "\n" #64

Open lemnos opened 2 years ago

lemnos commented 2 years ago

It seems that getline won't return until EOF unless RS is \n. This effectively means that long running programs won't produce output until they are terminated.

E.G

{ echo "test|one";sleep 2s; }|awk -vRS="|" 'NR == 1 { print; exit; }'

Won't print 'test' for two seconds. Is this intentional? Or have I missed something? All the other awks I tried (busybox, gawk, macos awk) produce the expected result

lemnos commented 2 years ago

Ah, it looks like this happens independently of RS and has to do with an internal buffer . The following won't print anything when mawk is used (but will under the other implementations). Is there a POSIX compliant way to disable this buffering? Is this permitted under the spec?

E.G

perl -e '
    $|++;

    for($i=0;$i<2047;$i++) {
        print "a\n" 
    }

    while(1) {}
'|mawk 'BEGIN{getline;print}'
ThomasDickey commented 2 years ago

This appears to be a variation of issue #12 (no one's indicated that POSIX specifies a particular behavior).

lemnos commented 2 years ago

I see, this is quite unfortunate as is forces me to add -Wi and hope that the other awk implementation which may run my script don't complain.

I'm sure you have considered it at length, but I would like to add my voice to the chorus of others who favour making -Wi default behaviour and make buffering explicit.

I have no hard data, but my suspicion is that the vast majority of awk invocations are run within scripts on relatively small amounts of data which expect real time output. For the cases in which larger data sets are parsed, and mawk is a deliberate choice because of its efficiency, it may make more sense to allow the user to explicitly specify the buffer size.

On another note, it seems that there is a related bug.

When -Wi is used RS is ignored.

E.G

 echo "record,record"|mawk -Wi -vRS="," 'NR==1{print}'

prints "record,record" instead of "record".

I appreciate the quick response and greatly admire the work you do.

nick87720z commented 2 years ago

gawk doesn't complain about -Wi itself, but this breaks arguments understanding, so last part is understood as file rather than program text before file. What's interesting - stdbuf is expected to usually work for most tools, such as grep, tr and sed - without their specific unbufferization options like for sed or grep. Also it did for gawk, making it to output in time even if started not from terminal, e.g.

free -h -s 1 | stdbuf -o0 grep 'Mem' | stdbuf -oL awk '{print}' | cat

In case of mawk - if it was about unbuffered input, than I expected stdbuf -i0 to do this, but it doesn't (tried stdbuf -i0 -o0). Though of course, gawk still has own problems, like not reacting to sigpipe under stdbuf -oL if e.g. redirected to head -n1 (but does under stdbuf -o0, what is strange).

Also must note - unbuffered mode has worse performance, so making it default would not be good. Would be great if it better interacted with stdbuf. When I played with sed and gawk before - using stdbuf -oL gave better performance than using fflush() in gawk or grep --line-buffered option (but can't compare with sed, which has only option for unbufferized mode).