containers / virtcontainers

A Go package for building hardware virtualized container runtimes
Apache License 2.0
139 stars 43 forks source link

[RFC] Generate enriched errors #657

Closed jodh-intel closed 6 years ago

jodh-intel commented 6 years ago

Context

All components of both Clear Containers and Kata Containers now produce structured logs. This is extremely useful as the log fields provide a lot of context on key stages in the runtime flows.

Current virtcontainers error example

However, errors currently suffer as they are wholly "free-format".

For example, here's an example of a structured log message logged by cc-runtime:

time="2018-02-28T21:18:55.292102004+01:00" level=error msg="Container not ready or running, impossible to signal the container" source=runtime pid=1234

Or formatted with the log-parser on https://github.com/kata-containers/tests/pull/98 for easier reading:

- count: 1
  timedelta: 0
  filename: /home/james/tmp/vc-old.txt
  line: 1
  time: !!timestamp 2018-02-28T21:18:55.292102004+01:00
  pid: 1234
  level: error
  msg: Container not ready or running, impossible to signal the container
  source: runtime
  name: cc-runtime
  data: {}

Notes:

Error based on this PR

This PR is just an idea for how we could improve our error handling. It adds a new implementation of Errorf() so that rather than the above, with this PR you'd get the following:

time="2018-03-02T14:40:30.311729832Z" level=error arch=amd64 error="Container not ready or running, impossible to signal the container" error-time="2018-03-02T14:40:30.311705732Z" file="\"/home/james/go/src/github.com/clearcontainers/runtime/vendor/github.com/containers/virtcontainers/container.go\"" function="github.com/clearcontainers/runtime/vendor/github.com/containers/virtcontainers.(*Container).kill" hello=world line=628 name=cc-runtime pid=22469 source=virtcontainers

... and here's the log-parser reformatted version of that line:

- count: 1
  timedelta: 0
  filename: /home/james/tmp/vc.log
  line: 1
  time: !!timestamp 2018-03-02T14:40:30.311729832Z
  pid: 22469
  level: error
  msg: ""
  source: virtcontainers
  name: cc-runtime
  data:
    arch: amd64
    error: Container not ready or running, impossible to signal the container                                                                                                                 
    error-time: "2018-03-02T14:40:30.311705732Z"
    file: '"/home/james/go/src/github.com/clearcontainers/runtime/vendor/github.com/containers/virtcontainers/container.go"'
    function: github.com/clearcontainers/runtime/vendor/github.com/containers/virtcontainers.(*Container).kill
    hello: world
    line: "628"

Notes:

Caveats

This PR is just an example. To be generally useful, we'd need this Errorf() function (and an equivalent to errors.New()) to be available to all components of Clear Containers and Kata Containers. That implies we'd need a new package for this functionality in a separate repo that we'd "vendor in" to all the components.

The other issue is how we handle the logger itself. Ideally, we'd find a way to register the logger with the new package and then have Errorf() use that logger object. But worst case, we'd need to change all the fmt.Errorf() and errors.New() calls to pass in the logger as an extra arg.

/cc @sameo, @sboeuf, @grahamwhaley.

sboeuf commented 6 years ago

@jodh-intel PR looks good, and I agree this should be in a new subpackage of virtcontainers. This way, this could be reused accross repos.

jodh-intel commented 6 years ago

Hi @sboeuf - regarding the separate package, what about Kata? If we agree to adopt "enriched errors", we'd want them used everywhere so wouldn't it be simpler to create a completely separate package (https://github.com/kata-containers/errors or https://github.com/kata-containers/utils or something) that all components could vendor to avoid vendoring (a sub-package of) virtcontainers?

sboeuf commented 6 years ago

Well given the fact that virtcontainers might very likely move under kata-containers/runtime, there is no need to move this to its own repo.

jodh-intel commented 6 years ago

A summary of current thinking...

The errors package

@markdryan pointed me at https://github.com/pkg/errors. This looks interesting, but although it improves the quality of the errors by adding a stack trace...

I'm after a solution that doesn't change the error interface and which generates "structured" errors.

Original API Idea

I have an almost complete implementation whose API is:

package errors

// SetLogger sets the error logger to be used when generating error
// objects.
func SetLogger(logger *logrus.Entry)

// New is equivalent to errors.New().
func New(msg string) error

// Errorf is equivalent to fmt.Errorf().
func Errorf(format string, args ...interface{}) error

// Error takes a "normal" error and converts it into an enriched error
func Error(err error) error

This works in most scenarios but gets tripped up due to the flexible nature of logrus. Specifically, the following idiom -- where a more specialised logger is created (with extra fields) -- is problematic:

// "register" logger with the errors package
errors.SetLogger(virtLog)

// ...

logger := virtLog.WithField("foo", "bar")

err := doSomething()
if err != nil {
    // log the error
    logger.WithError(err).Error("doSomething() failed")

    // return an enriched error.
    //
    // BUG: the Error() function is still using the original logger (virtLog)
    // which doesn't have the "foo" field.
    return errors.Error(err)
}

The ways to resolve this issue (atleast the ones I came up with) aren't particularly palatable:

A simpler API idea

I had a further think and came up with the following:

// LogError takes a logger which is assumed to contain an error and returns a
// new structured error which includes details of the error call site.
func LogError(entry *logrus.Entry) error

With this, we could then do magic like:

func f() error {
    logger := virtLog.WithField("foo", "bar")

    err := doSomething()
    if err != nil {
        // this logs 
        return LogError(logger.WithError(err))
    }

    value := doSomethingElse()
    if value == "" {
        return LogError(logger.Error("doSomethingElse() failed"))
    }

    return nil
}

The advantages of this approach:

An example of that API:

origErr := errors.New("hello world")
err := LogError(logger.WithError(origErr))
fmt.Println(err)

which generates an error something like:

time="2018-03-06T17:29:13.06774561Z" level=error error="hello world" error-file=/home/james/src/go/test_logrus_as_error2.go error-function=main.testLogError error-line=156 error-time="2018-03-06T17:29:13.067665955Z" name=test_logrus_as_error2 pid=25206 source=error-logger

... or re-formatted with the log-parser:

- count: 1
  timedelta: 0
  filename: /home/james/tmp/kata-logs/error-logger.txt
  line: 1
  time: !!timestamp 2018-03-06T17:29:13.06774561Z
  pid: 25206
  level: error
  msg: ""
  source: error-logger
  name: test_logrus_as_error2
  data:
    error: hello world
    error-file: /home/james/src/go/test_logrus_as_error2.go
    error-function: main.testLogError
    error-line: "156"
    error-time: "2018-03-06T17:29:13.067665955Z"
jodh-intel commented 6 years ago

Any thoughts on the above @sameo, @sboeuf, @grahamwhaley?

grahamwhaley commented 6 years ago

Most of that sounds sane to me, I think.

mostly unrelated - query - is the !! deliberate in the timestamps:

 time: !!timestamp 2018-03-06T17:29:13.06774561Z

and if so, what does it mean, and why do they not also appear in the error time?

jodh-intel commented 6 years ago

Yes, LogError() is logrus-specific. If we ever wanted to move to another logger, I'm assuming we'd find a way to rework that function in similar ways to extract the log fields.

Ignore those timestamps - you get the weird prefix when running kata-log-parser --output-format yaml .... See: http://yaml.org/type/timestamp.html.

sboeuf commented 6 years ago

@jodh-intel I am all for simplicity ! And your simpler API idea looks good to me. Especially for logging, it's important not to be confused with too many options.

sboeuf commented 6 years ago

@jodh-intel Need to be closed and moved to https://github.com/kata-containers/runtime/virtcontainers