jodh-intel commented 6 years ago

Improve the generated error objects.

jodh-intel commented 6 years ago

This is proving to be difficult: not the new API code, but how to use it.

Rationale for structured errors

They make it easy to establish where an error was created (or first detected).
They include log meta-data at the point where they are generated/detected to provide further debug information.
They allow the code to be simplified

Rather than making a log call and then returning an error, a single operation can be used to get the benefit of both operations (consistently).
They are machine parsable.

The API

A slightly modified version of the simple API shown on #657...

package e

func Log(entry *logrus.Entry) error

... works fine:

Description	current call	new call
Handle existing error	`return err`	`return e.Log(virtLog.WithError(err))`
Create static error	`return errrors.New(...)`	`return e.Log(virtLog.WithField("error", "..."))`
Create formatted error	`return fmt.Errorf(...)`	`return e.Log(virtLog.WithField("error", fmt.Sprintf(...))`

That's a little verbose, but bearable (I think).

The complication

The problem is now more about where the calls to e.Log() are placed. There are hundreds of code locations that return errors, but it doesn't make sense to change all of them to use e.Log(): if we do that, we'll end up with errors containing multiple sets of fields. The code handles this fine, but we'll essentially get "too much" detail when all we actually want is the full stacktrace for the location where the error was first detected.

In summary:

If we generate the structured errors only at the highest level, the additional fields (stacktrace and logger fields) won't be very useful.
If we generate the structured errors at all levels (the "search'n'replace" approach to adding e.Log() calls), we'll end up with overly "large" errors (errors with duplicate fields). Although the code supports creating a structured error from an existing structured error, it results in larger and larger error objects which is going to make the returned error less clear. It could also have a performance impact.

The solution therefore would seem to be to only create structured errors at the lowest possible layers:

In pseudo-leaf functions

Functions that do not call any functions in "lower" virtcontainers packages).
In the pkg/* packages

Problem: these don't have access to a logger, so the next-best approach is...
At package boundaries

In functions where virtcontainers calls code from other packages.
At plugin boundaries (see #667).

Layers

virtcontainers is structured at the file level, something like the following:

Level	files
high	`api.go`, `pkg/oci.go`
	`pod.go`, `container.go`
	`hypervisor.go`, `agent.go`, `shim.go`, `proxy.go`
medium	`qemu.go`, `agent_.go`, `shim_.go`, `proxy_.go`
	`asset.go`, `bridge.go`, `cni.go`, `cnm.go`, `device.go`, `filesystem.go`, `hook.go`, `mount.go`, `network.go`, `syscall.go`, `utils.go`
low	`pkg//.go` (excluding `pkg/oci.go`)

Application approaches

Although the code can be updated manually to introduce the structured error API call:

Adding the API to a subset of code locations (a subset of functions in a subset of files) is going to be error-prone.
It will also be error-prone to update the API calls when code when changes are required.

The best approach might be to find/write a tool to identify such code locations and error if there is no call to e.Log().

What might help is if we split the main package into sub-packages as suggested on #631.

containers / virtcontainers

Produce enriched errors #656

Rationale for structured errors

The API

The complication

Layers

Application approaches