containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.09k stars 600 forks source link

follow-up to #3343: review uses of bufio.Scanner #3370

Open CerberusQc opened 2 months ago

CerberusQc commented 2 months ago

What is the problem you're trying to solve

As discussed in https://github.com/containerd/nerdctl/issues/3343, the usage of bufio.Scanner when we do not have the control over the length of the message processed by the scanner can put the Scanner in a I/O error state failing silently and blocking the process sending a message.

@apostasie recommend that the usage of bufio.Scanner across the project is revisited.

Describe the solution you'd like

A solution is suggested with https://github.com/containerd/nerdctl/pull/3366 but might not be a "one size fit all solution".

Additional context

Documentation

The documentation state the following:

Scanning stops unrecoverably at EOF, the first I/O error, or a token too large to fit in the Scanner.Buffer. When a scan stops, the reader may have advanced arbitrarily far past the last token. Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead.

apostasie commented 2 months ago

Note:

For a cursory review: bufio.NewScanner is used for:

  1. processing soci command IO (processSociIO)
  2. processing notation command IO (processNotationIO)
  3. processing cosign command IO (processCosignIO)
  4. read /proc/net/* files (ReadStatsFileData)
  5. read /etc/os-release (DistroName)
  6. process env vars files (parseEnvVars) provided via --env-file
  7. compose log pipe tagger (PipeTagger.Run)
  8. process /etc/hosts files (parseHostsButSkipMarkedRegion)

I believe that:

That leaves 7, which is certainly a problem

Hey @CerberusQc

May I interest you in patching https://github.com/containerd/nerdctl/blob/main/pkg/composer/pipetagger/pipetagger.go#L97-L110 ? :-)

CerberusQc commented 2 months ago

Sorry for the late response, I am currently moving into a new house, I won't be able to work on this before 1-2 weeks.

I am wondering if it would be better to move the function into it's own struct to be more generic and allow to stay DRY in that case. If so, I am not really familiar with GO folders structure, where should I put the generic class ?

I was thinking about something like

type LogProcessor struct {
    wg       *sync.WaitGroup
    reader   io.Reader
    dataChan chan string
}

func NewLogProcessor(wg *sync.WaitGroup, reader io.Reader, dataChan chan string) *LogProcessor {
    return &LogProcessor{
        wg:       wg,
        reader:   reader,
        dataChan: dataChan,
    }
}

func (p *LogProcessor) Process() {
    defer p.wg.Done()
    defer close(p.dataChan)
    r := bufio.NewReader(p.reader)

    var err error

    for err == nil {
        var s string
        s, err = r.ReadString('\n')

        if len(s) > 0 {
            p.dataChan <- strings.TrimSuffix(s, "\n")
        }

        if err != nil && err != io.EOF {
            log.L.WithError(err).Error("failed to read log")
        }
    }
}

This is just a draft and did not run it, naming need improvement.

apostasie commented 2 months ago

No problem. Congrats on the new house!

On your code suggestion - let's chat on your (future) PR when you get there. On the location, what about under pkg/logging/processor.go? (and then let see what maintainers think)

See you in a few weeks then ;).