Do we really need channels and goroutines?

dragonsinth commented 7 years ago

I haven't done super serious profiling yet, but in a rather large application we run, our lexer keeps coming up in goroutine dumps at what seems to be a disproportionately high rate, especially given that we lex very small strings as a tiny part of everything our app does. I found a bunch of routines awaiting chan receive in the generated lexer, which kind of immediately raised my eyebrow. Given what the lexer does, does it really need to use channels and goroutines to accomplish its job? I'm not familiar with the design constraints or exactly what kind of guarantees we want to make for the user code that gets embedded in the generator, but it seems like this could all be done more efficiently single-threaded.

Thoughts?

blynn commented 7 years ago

There are no design constraints or guarantees: nex started as a toy weekend project to help me get acquainted with Go. My understanding was that channels and goroutines are cheap, but it's certainly possible I used them badly. I keep meaning to revisit the code one day, but heaps of other toy projects get in the way.

I'd be happy to accept patches that improve efficiency.

dragonsinth commented 7 years ago

Thanks for the info, I'll take a look and see if I can reformulate it a callback instead of a channel.

dragonsinth commented 7 years ago

I have a proof of concept of how the generated code can change. This works for my particular project (tests pass) and it's about 40% faster in an isolated benchmark I ran. Not sure it changes like this would impact other users. Take a look and tell me if you think this is a good direction, I went for minimal changes rather than a big overhaul.

@@ -13,7 +13,7 @@ type frame struct {
 }
 type Lexer struct {
    // The lexer runs in its own goroutine, and communicates via channel 'ch'.
-   ch chan frame
+   scan func(func(frame))
    // We record the level of nesting because the action could return, and a
    // subsequent call expects to pick up where it left off. In other words,
    // we're simulating a coroutine.
@@ -46,9 +46,8 @@ func NewLexerWithInit(in io.Reader, initFun func(*Lexer)) *Lexer {
    if initFun != nil {
        initFun(yylex)
    }
-   yylex.ch = make(chan frame)
-   var scan func(in *bufio.Reader, ch chan frame, family []dfa, line, column int)
-   scan = func(in *bufio.Reader, ch chan frame, family []dfa, line, column int) {
+   var scan func(in *bufio.Reader, f func(frame), family []dfa, line, column int)
+   scan = func(in *bufio.Reader, f func(frame), family []dfa, line, column int) {
        // Index of DFA and length of highest-precedence match so far.
        matchi, matchn := 0, -1
        var buf []rune
@@ -143,9 +142,9 @@ func NewLexerWithInit(in io.Reader, initFun func(*Lexer)) *Lexer {
                    text := string(buf[:matchn])
                    buf = buf[matchn:]
                    matchn = -1
-                   ch <- frame{matchi, text, line, column}
+                   f(frame{matchi, text, line, column})
                    if len(family[matchi].nest) > 0 {
-                       scan(bufio.NewReader(strings.NewReader(text)), ch, family[matchi].nest, line, column)
+                       scan(bufio.NewReader(strings.NewReader(text)), f, family[matchi].nest, line, column)
                    }
                    if atEOF {
                        break
@@ -160,9 +159,10 @@ func NewLexerWithInit(in io.Reader, initFun func(*Lexer)) *Lexer {
                }
            }
        }
-       ch <- frame{-1, "", line, column}
    }
-   go scan(bufio.NewReader(in), yylex.ch, []dfa{
+
+   yylex.scan = func(f func(frame)) {
+       scan(bufio.NewReader(in), f, []dfa{
            // CrOS
            {[]bool{false, false, false, false, true}, []func(rune) int{ // Transitions
                func(r rune) int {
@@ -4401,6 +4401,8 @@ func NewLexerWithInit(in io.Reader, initFun func(*Lexer)) *Lexer {
                },
            }, []int{ /* Start-of-input transitions */ -1, -1}, []int{ /* End-of-input transitions */ -1, -1}, nil},
        }, 0, 0)
+   }
+
    return yylex
 }

@@ -4431,7 +4433,7 @@ func (yylex *Lexer) Column() int {
    return yylex.stack[len(yylex.stack)-1].column
 }

-func (yylex *Lexer) next(lvl int) int {
+func (yylex *Lexer) next(lvl int, fr frame) int {
    if lvl == len(yylex.stack) {
        l, c := 0, 0
        if lvl > 0 {
@@ -4441,7 +4443,7 @@ func (yylex *Lexer) next(lvl int) int {
    }
    if lvl == len(yylex.stack)-1 {
        p := &yylex.stack[lvl]
-       *p = <-yylex.ch
+       *p = fr
        yylex.stale = false
    } else {
        yylex.stale = true
@@ -4453,9 +4455,8 @@ func (yylex *Lexer) pop() {
 }
 func findTokens(ua string) (bits uint64) {
    func(yylex *Lexer) {
-   OUTER0:
-       for {
-           switch yylex.next(0) {
+       yylex.scan(func(fr frame) {
+           switch yylex.next(0, fr) {
            case 0:
                {
                    // user code
@@ -4567,12 +4568,10 @@ func findTokens(ua string) (bits uint64) {
                { /* ignore */
                }
            default:
-               break OUTER0
-           }
-           continue
+               panic("shouldn't get here")
            }
            yylex.pop()
-
+       })
    }(NewLexer(strings.NewReader(ua)))
    return
 }

blynn commented 7 years ago

Looks good. Replacing a channel with a callback should be harmless. I'd be happy to merge a commit with this change.

dragonsinth commented 7 years ago

OK I'll put a PR together. Anything I should know about early exit? Previously the user code could probably break out of the loop early (although that would leak the go routine). With this formulation there's no way to early exit short of a panic. Is that OK or should I provide some way break out?

blynn commented 7 years ago

I can't be sure anymore, but I have a feeling that the break is needed when there are nested patterns. Can you build a patched version of nex then run "go test" in the test/ subdirectory?

dragonsinth commented 7 years ago

Yeah, just did that. In fact, the first test example (rp.nex) expects to be able to return an int from the main Lex() method, and another one hit the panic() I threw in for the default case, so it definitely expects to be able to break out.

To be fair, both of those examples aren't truly correct on master since they'll leak the background goroutine and basically pin everything, but it looks like I need a way to propagate an exit condition up.

blynn commented 7 years ago

If it's easiest, panic and recover are fine. Firstly, this is generated code, and secondly, I have no qualms about using panic and recover for control flow. In fact, I vaguely remember seeing this in the regexp library code back in the day.

On Sun, Jan 8, 2017 at 8:51 PM, Scott Blum notifications@github.com wrote:

Yeah, just did that. In fact, the first test example (rp.nex) expects to be able to return an int from the main Lex() method, and another one hit the panic() I threw in for the default case, so it definitely expects to be able to break out.

To be fair, both of those examples aren't truly correct on master since they'll leak the background goroutine and basically pin everything, but it looks like I need a way to propagate an exit condition up.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blynn/nex/issues/38#issuecomment-271211052, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAOWPbSVuJtDgoPomPoCz8gblRZGOFeks5rQby0gaJpZM4Ldj5E .

dragonsinth commented 7 years ago

After digging on this more, I'm pretty sure the naive approach isn't going to work. The Yacc example expects to be able to call Lex() repeatedly and get a token at a time, and have it return and call it again in the same state. But all the DFA states, line, column are local to the scan routine, which means exiting and starting over resets everything.

Supporting that kind of operation means a more invasive refactor to move all the scan() local state into the Lexer itself.

Does all that gel with your understanding? Because I'm only like 90% sure that what I wrote is accurate.

blynn commented 7 years ago

This sounds familiar. I recall enjoying the challenge of tracking the states of each DFA.

I wonder if the slowness is caused by leaking goroutines rather than the overhead of goroutines. Is it easy to plug the leaks?

On Sun, Jan 8, 2017 at 10:10 PM, Scott Blum notifications@github.com wrote:

After digging on this more, I'm pretty sure the naive approach isn't going to work. The Yacc example expects to be able to call Lex() repeatedly and get a token at a time, and have it return and call it again in the same state. But all the DFA states, line, column are local to the scan routine, which means exiting and starting over resets everything.

Supporting that kind of operation means a more invasive refactor to move all the scan() local state into the Lexer itself.

Does what I wrote above gel with your understanding? Because I'm only like 90% sure that what I wrote is accurate.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blynn/nex/issues/38#issuecomment-271216459, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAOWPttjGWOKj7AwYw6ov57e5Mob-Riks5rQc9egaJpZM4Ldj5E .

dragonsinth commented 7 years ago

Plugging the leaks is doable, basically you'd need a doneChan channel that runs in the opposite direction, from the consumer routine to the producer routine. You'd setup a defer in the consumer to close doneChan, and you'd turn frame writes into the frame channel into a select statement that will either write to the frame channel or read from the doneChan, and if you ever read from doneChan you exit immediately.

That said, I don't think our application is actually leaking goroutines here, and it's still slowish for us. On a non-leaking, single-threaded benchmark I wrote it went from 5.4s -> 3.1s just changing channels to callbacks.

I also got a (smaller) speedup by moving the static DFA transition table out to "constant" and sharing them being instances instead of recreating the static table each time, but I was going to submit a separate PR for that.

dragonsinth commented 7 years ago

Hmm, actually when I dug in a bit, it's unclear when you would actually close the doneCh to signal the background routine to exit. For NN_FUN() type usages you could do it when the NN_FUN() returns, but for yacc-type use it's way less clear.

In the following code, when would you signal the background loop to exit?

// Lex runs the lexer. Always returns 0.
// When the -s option is given, this function is not generated;
// instead, the NN_FUN macro runs the lexer.
func (yylex *Lexer) Lex(lval *yySymType) int {
OUTER0:
    for {
        switch yylex.next(0) {
        case 0:
            { /* Skip blanks and tabs. */
            }
        case 1:
            {
                lval.n, _ = strconv.Atoi(yylex.Text())
                return NUM
            }
        case 2:
            {
                return int(yylex.Text()[0])
            }
        default:
            break OUTER0
        }
        continue
    }
    yylex.pop()

    return 0
}
func main() {
    yyParse(NewLexer(os.Stdin))
}

blynn commented 7 years ago

I had a brief look, and it seems all goroutines return so long as end of input is reached. They send one last frame with i = -1 down the channel, which triggers the default cases of the switch statements, causing them to break to those "OUTER" labels.

On a couple of tests, I put in a couple of fmt.Println(runtime.NumGoroutine()) statements to check this (with a delay before printing the post-lexing count so there's enough time for all the goroutines to finish). The goroutines appear to be cleaned up just fine.

Is your application stopping lexing before end of input? I didn't anticipate this. If so, one workaround is to extend your grammar to read but ignore everything past the last token to the end of input, though this might be costly.

On Mon, Jan 9, 2017 at 2:35 PM, Scott Blum notifications@github.com wrote:

Hmm, actually when I dug in a bit, it's unclear when you would actually close the doneCh to signal the background routine to exit. For NN_FUN() type usages you could do it when the NN_FUN() returns, but for yacc-type use it's way less clear.

In the following code, when would you signal the background loop to exit?

// Lex runs the lexer. Always returns 0. // When the -s option is given, this function is not generated; // instead, the NNFUN macro runs the lexer. func (yylex Lexer) Lex(lval yySymType) int { OUTER0: for { switch yylex.next(0) { case 0: { / Skip blanks and tabs. / } case 1: { lval.n, = strconv.Atoi(yylex.Text()) return NUM } case 2: { return int(yylex.Text()[0]) } default: break OUTER0 } continue } yylex.pop()

return 0 } func main() { yyParse(NewLexer(os.Stdin)) }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blynn/nex/issues/38#issuecomment-271429900, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAOWJrUoYx7qD4l8Vj8U0HUeTtzBWSIks5rQrY5gaJpZM4Ldj5E .

dragonsinth commented 7 years ago

Yeah, I think you're right. As long as all the input is ultimately consumed the background routine should exit.

marsjupiter1 commented 4 years ago

I have running in to awful performance when reinvoking the lexer and have resorted to a "patch". The patch below never seems to fail and makes a 1000 fold and more speed increase.

--- generated_fxnn.go 2019-10-14 10:37:46.835681718 +0000 +++ generated_fxnn.go_works 2019-10-14 10:36:27.672678048 +0000 @@ -145,25 +145,10 @@ text := string(buf[:matchn]) buf = buf[matchn:] matchn = -1

for {
sent := false
select {
case ch <- frame{matchi, text, line, column}:
{
sent = true
}
case stopped = <-ch_stop:
{
}
default:
{
// nothing
}
}
if stopped || sent {
break
}
}
ch <- frame{matchi, text, line, column}

            if stopped {
                break
            }

purpleidea commented 4 years ago

@marsjupiter1 If you'd like to suggest a patch, submit it as a pull request please. The current form is not something we can read.

marsjupiter1 commented 4 years ago

it's in unix patch format and simply strips out the select leaving just the channel read. I have not delved into your code to find a true fix. I just figured what might work and it seems to, and turns a test that took an hour before crashing go, into one that works in seconds.

regards

Martin

On Tue, Oct 15, 2019 at 1:30 AM James notifications@github.com wrote:

@marsjupiter1 https://github.com/marsjupiter1 If you'd like to suggest a patch, submit it as a pull request please. The current form is not something we can read.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blynn/nex/issues/38?email_source=notifications&email_token=AHUSJM3KQ2J2K76AGGI6BSTQOUFJNA5CNFSM4C3WHZCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBHARMI#issuecomment-541984945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHUSJM443GCX7ODONF3LZ53QOUFJNANCNFSM4C3WHZCA .

blynn / nex

Do we really need channels and goroutines? #38