benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support
https://benhoyt.com/writings/goawk/
MIT License
1.95k stars 84 forks source link

Fix wrong precedence of 'expr | getline' expressions #216

Closed fioriandrea closed 8 months ago

fioriandrea commented 1 year ago

This pull request modifies getLine() inside parser.go so that it lets binary operators come after 'expr | getline' expressions.

I also uncommented two tests that were in TODO inside interp_test.go and added two new tests.

There were two other tests under the same TODO: (BEGIN { print("echo foo" | getline ($0+1)); print } and BEGIN { print("echo foo" | getline foo()); print } function foo() { print "z" }), but those address a separate issue.

Fixes #189

benhoyt commented 1 year ago

Hmm, now that I see this and have a play with it, I'm not sure about this. It seems to me a bit too ad-hoc and special-casey. Perhaps that's part of the reason it doesn't yet work with the two commented-out tests (and with the Gawk getline and getline3 tests skipped in goawk_test.go). What are we missing that would make this more general?

fioriandrea commented 1 year ago

@benhoyt I reverted the changes I made with the matches() function and renamed the PR. After thinking more carefully about the problem, I realized that:

What I mean by this is that expr can be any expression with precedence higher than assignments, but the complete expression expr | getline has the same precedence as a primary expression. So, in a sense, expr | getline has both lower and higher precedence than all other expressions (except assignments, which always have lower precedence).

Indeed, "echo " "date" | getline + 1 gets parsed as (("echo " "date") | getline) + 1 by all other awk implementations.

Although the code I just committed is a bit hacky, it fixes all broken test cases. Basically, I save the lhs of the expr | getline expression (in the example above, "echo " "date") in the parser structure and then I handle the pending pipe operator inside the primary function, where I finish building the expr | getline expression. By doing this, the lhs can be any expression with precedence higher than assignments, but, since I construct the complete expr | getline expression inside primary, the complete expression gets the same precedence as the other primary expressions (as the bullet list above explains).

I would like to hear your opinion on this (especially if you can think of a less hacky way to handle this)

fioriandrea commented 8 months ago

I will look more closely at the code in the coming days, but for now I can say this:

  1. Two possible (incorrect) programs that trigger the condition are: BEGIN { | } and BEGIN { 1 + 1 - | }. I can add these two programs in the tests
  2. I don't think this can happen. In order for pendingGetlineLeft to be overwritten there would have to be a situation in which pendingGetlineLeft is set and, while it's set, someone calls Parser.getLine(). But pendingGetlineLeft is set only after we are sure that there is a pipe immediately after what we just parsed. When there is a pipe immediately after, we know that we will land straight on the PIPE case in the Parser.primary function, where pendingGetlineLeft is immediately unset and its old value is secured in a temporary variable. It may happen that there is another getline that will be parsed briefly after Parser.primary returns (or just before it returns, in case we have something like "ls" | getline $("ls" | getline), but at that point, there can be no conflict, as pendingGetlineLeft has already been unset and its value has been secured in the left variable. So there isn't any opportunity for pendingGetlineLeft to be overwritten.
fioriandrea commented 8 months ago

I made some modifications to my code in the hope that the intent is more clear. In any case, the logic is the exact same as what I wrote in the last commit.

First of all, I removed the if condition with the "expected expression instead of %s" error, as the situation can be handled in the default branch of the switch statement in primary. In order to do this, I moved the code I had before in its own if at the top of the primary function. The intent of having pendingGetlineLeft looks more clear with this changes (at least to me). Nevertheless, I added the tests for BEGIN { | } and BEGIN { 1 + 1 - | }.

For the second point, I think the reasoning I did in my last comment was sound. We parse an arbitrary expression with precedence <= than cond, then, if we encounter a PIPE, we know we are inside a getline expression, so we save the lhs in a variable, then we go straight to primary, where we finish constructing the rhs of getline and where we unset pendingGetlineLeft, leaving no chance of overwriting.

I also thought of putting p.expect(GETLINE) inside if p.tok == PIPE { in the getline function. If we want, we can do that. It doesn't change the behavior and it might make things more clear.

benhoyt commented 8 months ago

That looks a bit nicer, and thanks for the additional test cases. Really appreciate you working through this, @fioriandrea! Merging now.