go-cmd / cmd

Non-blocking external commands in Go with streaming output
MIT License
918 stars 128 forks source link

[streaming] Line breaks and buffer overflows (broken pipes) #104

Open 13k opened 4 months ago

13k commented 4 months ago

This is similar to #4, but it's more related to child process output content rather than output/buffer sizes.

Child processes that output a lot of data, or run long enough, but don't emit line breaks will eventually cause buffer overflows and therefore broken pipe errors.

This is easily reproducible:

package main

import (
    "fmt"
    "log"
    "math/rand/v2"
    "os"
    "time"

    "github.com/go-cmd/cmd"
)

func main() {
    if len(os.Args) > 1 {
        server()
    } else {
        client()
    }
}

func server() {
    ttl := time.Second * 10
    timeout := time.After(ttl)

    fmt.Printf("[server] start (running for %s)\n", ttl)

    for {
        n := time.Duration(rand.Int64N(10))
        step := time.After(time.Millisecond * n)

        select {
        case <-step:
            fmt.Printf("\033[2K\r[server] %d", n)
        case <-timeout:
            fmt.Println("\033[2K\r[server] exit")
            return
        }
    }

}

func client() {
    selfExe, err := os.Executable()

    if err != nil {
        panic(err)
    }

    svCmd := []string{selfExe, "server"}

    log.Printf("[client] launching server: %q", svCmd)

    cmd := cmd.NewCmdOptions(cmd.Options{Streaming: true}, selfExe, "server")

    go func() {
        for output := range cmd.Stdout {
            fmt.Fprintln(os.Stdout, output)
        }
    }()

    go func() {
        for output := range cmd.Stderr {
            fmt.Fprintln(os.Stderr, output)
        }
    }()

    log.Println("[client] wait server")

    status := <-cmd.Start()

    if status.Error != nil {
        log.Fatal(status.Error)
    }

    log.Printf("[client] server status: %v", status)
    log.Println("[client] exit")
}

You can notice that in each server loop iteration, it doesn't emit a line break to the output, it uses a ANSI control sequence (\033[2K) to clear the line and return the cursor to the start of the line (\r).

go-cmd will keep accumulating output, because it doesn't see a \n and will never send a message to the output channel. This causes two issues.

The overflow issue

Eventually go-cmd's internal buffer overflows:

> go run streaming.go
2024/05/31 14:17:18 [client] launching server: ["/tmp/go-build653253379/b001/exe/streaming" "server"]
2024/05/31 14:17:18 [client] wait server
[server] start (running for 10s)
2024/05/31 14:17:24 signal: broken pipe
exit status 1

I believe there are no buffer size adjustments that could solve this.

The issue with buffering based on assumptions about the data format

go-cmd assumes the output will be line oriented and sends channel messages for each line.

I imagine this is done for convenience, but it breaks streaming flows that could be important, like showing progress status (e.g.: rclone, aria2c, wget, curl, etc).

Even for simpler cases (ANSI control sequences are not used) or smaller cases (all of the output would fit in the buffer), for example when a process prints progress on the same line (like a series of dots or percentages), go-cmd's implementation breaks those flows (e.g.: some mkfs implementations do this if I remember correctly).

Another broken use case would be processes that emit very long lines. It could be the case for minifiers/uglyfiers/obfuscators, or language servers (LSPs) that are known to emit long JSON lines to stdio, for example. go-cmd would have to keep increasing buffer size to accomodate for this.

Proposed solution

I believe the solution to both issues is to just do byte oriented streaming with buffering: don't make assumptions about data format and emit chunks downstream whenever the buffer is filled. The client is responsible to reassemble the data if needed.

daniel-nichter commented 3 months ago

Right, go-cmd is inherently line-oriented/buffered. It's original (and still primary) use case is standard Linux command output.

A byte-oriented streaming buffer would be a nice addition. Off top of my head, the design probably needs both a buffer size and a flush timeout/interval. Let's say buff size is 1k but only 900B are received and upstream pauses for a long time. Should go-cmd wait for the remaining 100B to flush? Or flush what it's go so far (900B) after some time?

You or someone else want to write a PR? Or, if others want this new capability, 👍 the original post.