benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support
https://benhoyt.com/writings/goawk/
MIT License
1.95k stars 84 forks source link

Mutually-recursive functions without type info cause compiler panic #186

Closed benhoyt closed 1 year ago

benhoyt commented 1 year ago

Per @xonixx's comment, GoAWK can still panic on mutually-recursive functions that don't have clear type information (pass their args onto new functions without using them as a scalar or array). This doesn't happen often in real code, but it's relatively simple to create an example that breaks the compiler. Copying @xonixx's message below:

Well, but topological sort can sort nodes only when there are no loops in the call graph. Since in the presence of mutual recursion there are loops in the graph I'm not sure that topoSort can help at all. I've played a bit with your example:

function f1(a) { if (0) f5(z1); f2(a) }
function f2(b) { if (0) f4(z2); f3(b) }
function f3(c) { if (0) f3(z3); f4(c) }
function f4(d) { if (0) f2(z4); f5(d) }
function f5(i) { if (0) f1(z5); i[1]=42 }
BEGIN { x[1]=3; f5(x); print x[1] }

Still fails:

$ ../../proj/gron.awk/soft/goawk1.23.2 -f delme.awk 
42

$ ../../proj/gron.awk/soft/goawk1.23.2 -f delme.awk 
42

$ ../../proj/gron.awk/soft/goawk1.23.2 -f delme.awk 
panic: internal error: found scalar when expecting array "z3" [recovered]
        panic: interface conversion: interface {} is string, not *compiler.compileError [recovered]
        panic: interface conversion: interface {} is *runtime.TypeAssertionError, not *ast.PositionError

goroutine 1 [running]:
github.com/benhoyt/goawk/parser.ParseProgram.func1()
        /home/ben/h/goawk/parser/parser.go:70 +0xf4
panic({0x54b7c0, 0xc0000a09f0})
        /home/ben/sdk/go1.20.4/src/runtime/panic.go:884 +0x213
github.com/benhoyt/goawk/internal/compiler.Compile.func1()
        /home/ben/h/goawk/internal/compiler/compiler.go:66 +0x78
panic({0x544880, 0xc00009e680})
        /home/ben/sdk/go1.20.4/src/runtime/panic.go:884 +0x213
github.com/benhoyt/goawk/internal/compiler.(*compiler).arrayInfo(0x54a2c0?, {0xc0000b409a, 0x2})
        /home/ben/h/goawk/internal/compiler/compiler.go:206 +0xd4
github.com/benhoyt/goawk/internal/compiler.(*compiler).expr(0xc0000e5640, {0x59e9f8?, 0xc0000b21c0?})
        /home/ben/h/goawk/internal/compiler/compiler.go:895 +0x356a
github.com/benhoyt/goawk/internal/compiler.(*compiler).stmt(0xc0000e5640, {0x59ed00?, 0xc0000a04b0?})
        /home/ben/h/goawk/internal/compiler/compiler.go:310 +0xd9f
github.com/benhoyt/goawk/internal/compiler.(*compiler).stmts(...)
        /home/ben/h/goawk/internal/compiler/compiler.go:221
github.com/benhoyt/goawk/internal/compiler.(*compiler).stmt(0xc0000e5640, {0x59edc0?, 0xc0000cc700})
        /home/ben/h/goawk/internal/compiler/compiler.go:335 +0x31c9
github.com/benhoyt/goawk/internal/compiler.(*compiler).stmts(...)
        /home/ben/h/goawk/internal/compiler/compiler.go:221
github.com/benhoyt/goawk/internal/compiler.Compile(0xc0000cc930)
        /home/ben/h/goawk/internal/compiler/compiler.go:104 +0x1709
github.com/benhoyt/goawk/parser.ParseProgram({0xc0000dc000?, 0x7ffc83d26ea8?, 0x9?}, 0xc0000e5dd0)
        /home/ben/h/goawk/parser/parser.go:91 +0x19b
main.main()
        /home/ben/h/goawk/goawk.go:269 +0x1fd3

$ gawk -f delme.awk 
42

$ mawk -f delme.awk 
42

Also note, how it passes or fails sporadically with GoAWK. I think this is due to randomized map iteration in Go.

benhoyt commented 1 year ago

FWIW, GoAWK had issues with this prior to the resolver rewrite in v1.23.0, though it was an error rather than a panic:

$ git checkout v1.22.0
$ go run . -f t.awk
t.awk:1:33: can't pass scalar "a" as array param
function f1(a) { if (0) f5(z1); f2(a) }
                                ^
exit status 1
$ go run . -f t.awk  # it's intermittent there as well
42
xonixx commented 1 year ago

Old enough versions, like 1.13.0 or 1.9.2 working well on this one.

benhoyt commented 1 year ago

Note that this broke between v1.18.0 and v1.19.0 in commit https://github.com/benhoyt/goawk/commit/0bc9499481ee7775f1cc5fcca3bcac1443811bf4, which introduces topological sorting. Previously we tried to resolved vars in a loop, up to 500 max iterations while we were still "progressing" (i.e., assigning one or more unknown types), which was potentially O(N^2) but at least did the job.

benhoyt commented 1 year ago

@xonixx It's a bit of a brute-force solution, but here's my fix: https://github.com/benhoyt/goawk/pull/187 -- let me know if you have better ideas.

xonixx commented 1 year ago

Looks good to me. Looks like not much else can be done with the current architecture. Also I guess that topo sort is needed strictly as an optimization, right?

benhoyt commented 1 year ago

Also I guess that topo sort is needed strictly as an optimization, right?

Yes, that's right. It's on optimization for "normal" tree-like call graphs to only take one pass.

I think I'll merge this fix, and contemplate this further. I'm sure there's a more elegant way to solve this, but it's not urgent or particularly necessary.