benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support
https://benhoyt.com/writings/goawk/
MIT License
1.95k stars 84 forks source link

parser: Discriminate between constant and dynamic regexps #221

Closed ypdn closed 9 months ago

ypdn commented 9 months ago

I've been using the github.com/benhoyt/goawk/parser package to format my awk scripts, but there is a small issue: after parsing and printing a program (with Program.String) "constant" regexps become "dynamic" regexps.

For example:

{sub(/regexp/, "", $0)} becomes:

{
    sub("regexp", "", $0)
}

Arguably it doesn't matter but it would be nice if it didn't change the regexps. Thanks for your work.

benhoyt commented 9 months ago

Thanks for the report. Program.String() isn't actually intended to be a proper AWK code formatter, but I'm glad that you're able to use it that way. :-)

I'd be in favour of adding this. Currently the AST doesn't distinguish between /regexp/ and "regexp" for these cases, both are just parsed as *ast.StrExpr. There is an *ast.RegExpr already, but it's currently only used for a stand-alone regexp, so I'm not sure we can re-use that. Probably need a new ast.FooExpr type, or we could keep track of which StrExprs are actually /regexes/.

benhoyt commented 9 months ago

Would you be interested in coming up with a PR by any chance?

ypdn commented 9 months ago

... or we could keep track of which StrExprs are actually /regexes/

This seems like the better way to do it, since we're only interested in its String method. I've submitted the PR.