MatrixAI / js-errors

Error handling with error chaining
https://polykey.com
Apache License 2.0
0 stars 0 forks source link

Release error chaining and formalise custom errors #1

Closed CMCDragonkai closed 2 years ago

CMCDragonkai commented 2 years ago

Description

Issues Fixed

Tasks

Final checklist

CMCDragonkai commented 2 years ago

In doing this, I discovered that our toJSON method is actually buggy in PK. It is now fixed here and will be used.

Maybe this impacts GRPC error serialisation.

Furthermore I'm playing around with error chaining types to see if it's possible to have type-guided error chains which will help for debugging.

CMCDragonkai commented 2 years ago

This will require js-errors onto the gitlab CI/CD.

CMCDragonkai commented 2 years ago

This has made the ErrorChain into a generic type instead. This enables some nice way of allowing TS to "guide" the usage of the cause. Thus a deep exception chain can result in a deep type. Like AbstractError<AbstractError<Error>>.

CMCDragonkai commented 2 years ago

One of the important functions would be a static fromJSON method.

This would follow our js-id style where we take any and convert it back to JSON as per the format of toJSON.

Note that when using toJSON we actually produced something like:

function toJSON(id: IdInternal): { type: string; data: Array<number> } {
  return {
    type: IdInternal.name,
    data: [...id],
  };
}

Thus producing a very "structured" JSON version of the exception. Unlike our current toJSON which just produces a POJO of all the properties.

Since we may expect to serialise exceptions over the network, we would want something to do this.

However there's a little challenge here. Because the cause itself may contain almost anything, there's no guarantee that when calling fromJSON, that we could "reconstruct" the original object graph.

This is discussed in this comment https://github.com/MatrixAI/js-polykey/issues/304#issuecomment-1028561787

CMCDragonkai commented 2 years ago

It was explained that GRPC metadata doesn't have "nested" data.

However all the fields are "multivalues". Thus it is possible to do:

const errorName = e.metadata.get('name')[0] as string;
const errorMessage = e.metadata.get('message')[0] as string;
const errorData = e.metadata.get('data')[0] as string;

Deserialising from this would not be using the fromJSON as this would a custom format just for GRPC metadata.

However an alternative is to avoid relying on the metadata format, and directly set the stringified JSON in just one property. Which gives us some flexibility at the cost of some lower transparency in the GRPC metadata (any reaction to errors, has to involve potentially parsing the JSON first).

The grpcUtils.toError is already used by all the generator wrapper utilities in grpc/utils. This throws any GRPC ServiceError as a domain specific error in Polykey.

During serialisation, we can produce a structured "{ type, data }" format that is used by js-id, so that we have a similar kind of "JSON encoded object" format where the type is always the class name, and the data is the the data required for the fromJSON call.

Our GRPC metadata properties can mirror some of the important data from these exception objects. But there should be an error property that actually contains the JSON exception information. e.metadata.get('error').

The current GRPC properties we are using is:

  metadata.set('name', error.name);
  metadata.set('message', error.message);
  metadata.set('data', JSON.stringify((error as errors.ErrorPolykey).data));

We know that GRPC also sets additional properties whenever a ServerStatusResponse object is emitted on an error event.

export interface StatusObject {
    code: Status;
    details: string;
    metadata: Metadata;
}

export declare enum Status {
    OK = 0,
    CANCELLED = 1,
    UNKNOWN = 2,
    INVALID_ARGUMENT = 3,
    DEADLINE_EXCEEDED = 4,
    NOT_FOUND = 5,
    ALREADY_EXISTS = 6,
    PERMISSION_DENIED = 7,
    RESOURCE_EXHAUSTED = 8,
    FAILED_PRECONDITION = 9,
    ABORTED = 10,
    OUT_OF_RANGE = 11,
    UNIMPLEMENTED = 12,
    INTERNAL = 13,
    UNAVAILABLE = 14,
    DATA_LOSS = 15,
    UNAUTHENTICATED = 16
}

When we are throwing our application exceptions, we are using the UNKNOWN code, this is just because that's the default code to be used when additional details aren't used in the toError(): ServerStatusResponse. And according to https://grpc.github.io/grpc/core/md_doc_statuscodes.html it should be used for application errors. But there may be some overlap between application errors and these traditional errors. We may want to make use of some of these status codes like NOT_FOUND when that's what it means.

Suppose all 3 metadata properties was reduced to just:

  class ErrorSomething extends AbstractError {}
  const e = new ErrorSomething();
  metadata.set('error', JSON.stringify(e));
  // we know the data will look like this
  // {
  //   type: 'ErrorSomething',
  //   data: { 
  //     ...,
  //     cause: { 
  //       type: 'AnotherError',
  //       data: { ... }
  //     }, 
  //     ... 
  //   }
  // } 

On the other side, we acquire the error property like this:

// e.code === grpc.status['UNKNOWN']
// e.details === ??? - not sure what this would be, maybe an associated message for `UNKNOWN` status
errorsUtils.fromJSON(e.metadata.get('error')[0] as string)

Now the errorsUtils.fromJSON or AbstractError.fromJSON has do some complicated work.

It has to take the type field, and then try to "reconstruct" the object.

In js-id this is doable, cause we are always using IdInternal, the same class.

In this case, AbstractError is supposed to be used like an abstract class, and must construct based on a name. However without reference to the class structure, this is not possible. Furthermore, we cannot allow just arbitrary strings to be evaluated. It must be constructed with respect to a "restricted list" of classes.

This is what we currently do in toError where it:

        return new errors[errorName](errorMessage, JSON.parse(errorData));

This only works because errors is a module being exposed like an object, and we use it as a lookup table for errorName, to actually get a reference for the class to reconstruct.

Let me prototype a static function that gets inherited and uses the internal class name. We may be able to work something out, if we can expect the same class wrapper at the root of the exception structure.

Any types that are not valid exceptions, would then need to be converted to a "base" exception that gets used. We already do something like this in grpcUtils.toError:

        return new grpcErrors.ErrorGRPCClientCall(e.message, {
          code: e.code,
          details: e.details,
          metadata: e.metadata.getMap(),
        });

That acts as the base during recursive traversal of the JSON structure.

CMCDragonkai commented 2 years ago

Because causes can be arbitrary, this creates a form of open-extensibility for the cause chain, which means that encoding and decoding to and from JSON has to deal with arbitrary data types.

To deal with this, I suspect a combination of JSON's replacer and reviver mechanisms have to be used to work against a select set of types.

In this library alone, the only types we are aware of are AbstractError and Error and descendants of Error like SyntaxError... etc.

During encoding, instances of Error does not have toJSON methods. This means JSON.stringify(new Error()) just gives back {}.

We can at least resolve this issue by dealing with this in our toJSON method.

It also seems that we may provide reviver or replacer utilities. But again we don't know the right types too.

Alternatively the cause chain can be restricted rather than allowing any arbitrary type. What would this restriction be?

Seems like it could be limited to just instances of Error and instances of AbstractError. During encoding, we can know that all Error instances can have name, message and stack, and associated properties with AbstractError, but we still won't be able to restore more specific types than AbstractError.

So at the end of the day, the user must supply the appropriate reviver or replacer. This would be case with PK.

CMCDragonkai commented 2 years ago

Additional requirement coming from https://github.com/MatrixAI/js-polykey/pull/321#issuecomment-1028554814

I realised that when converting GRPC errors back into exceptions, there's no indication that this is coming from another node/call, rather than being raised internally. The only data we have is error message and error data.

Consider when we make a function call to a particular domain like VaultManager. The VaultManager may end up calling NodeConnectionManager and NodeConnection and subsequently RPC the other node. When the other node receives it, its handler may end up calling VaultManager as well. At any point here there may be exceptions that are thrown. It makes sense, that we should differentiate exceptions that were thrown from our own node code, or from the other side over a GRPC call.

One way to differentiate this is to augment the exception with additional details in the data property either when it is sent over the GRPC metadata, or when it is received. Additionally a different class can be used to wrap exceptions that are being sent over the GRPC metadata. If details are always expected to be added on the sending side, this can be problematic because if the sender does something unexpected, this can affect the receiver. Instead it should be the receiver's responsibility upon receiving the exception to decorate/augment the exception appropriately that it came from the other side. Note that receiver here just means whichever side receives the exception, not necessarily who started a GRPC call.

This means grpcUtils.toError would need to appropriately wrap the exceptions from ServiceError object to errors.ErrorPolykeyRemote (this was suggested as the special class).

This new exception can then have in its cause chain pointing to the reconstructed exceptional objects that came in from serviceError.metadata.get('error')[0].

It may not actually be necessary to reconstruct all exception objects in the cause chain. In fact there's no guarantee this would even be possible because the sender may have classes that does not exist on the receiver.

This can be addressed by have a base exception class indicating unknown exception that it decodes to, or it can be dealt with by not decoding those JSON exceptions and terminating the recursive decoding there.

CMCDragonkai commented 2 years ago

Additionally we should also differentiate client to agent calls, and agent to agent calls, so a single ErrorPolykeyRemote would only work if the nodeId was a property that we would know where the error came from. The PK node that we just called, or another PK node that our PK node called.

import { AbstractError } from '@matrixai/errors';
import sysexits from './utils/sysexits';

// exitCode is not part of `AbstractError` since it is specific to how PK uses it
class ErrorPolykey extends AbstractError {
  static description = 'Polykey error';
  exitCode: number = sysexits.GENERAL;
}

// here is our new wrappers intended to be used by `grpcUtils.toError` when receiving error over GRPC call

// Client to Agent calls and Agent to Agent Calls
// use the nodeId property to differentiate between own agent and foreign agent
class ErrorPolykeyRemote extends ErrorPolykey { 
  static description = 'Remote error from RPC call';
  exitCode = sysexits.UNAVAILABLE;
  nodeId: NodeId;
  nodeAddress: NodeAddress;
}

// the exit code for `ErrorPolykeyRemote` may actually depend on what it is wrapping though
CMCDragonkai commented 2 years ago

The grpcUtils.fromError must strip debug information when sending exceptions over the network to untrusted source.

This should be done for client to agent and agent to agent.

We now have output format options in AbstractError.toJSON that allows one to control what gets outputted during serialisation to JSON. Note that by not providing the cause, deeper properties won't be provided. But if the cause is also provided, then subsequent cause may also be provided.

That would mean one cannot just use JSON.stringify to auto-call the exception object, but instead manually call toJSON() first then use JSON.stringify afterwards.

Alternatively the replacer callback can be used here to help specialise what gets serialised.

Important details like stack and some limitation of the cause chain may be needed to be stripped.

CMCDragonkai commented 2 years ago

As an example rather than JSON.stringify(new AbstractError).

You do:

const e = new AbstractError();
JSON.stringify(e.toJSON(undefined, { stack: false });

That ensures that the exception being stringified is already being filtered, in this case the stack property is not provided.

CMCDragonkai commented 2 years ago

It turns out that as soon as there's a toJSON in the object, it is called before the replacer is called. But this is also true recursively. Thus our custom replacer here has no need to check if it is an instanceof AbstractError as this condition will never be met. The toJSON would already be called.

CMCDragonkai commented 2 years ago

I just discovered that JS Error now supports cause property: https://v8.dev/features/error-cause

It's also available https://stackoverflow.com/questions/22303633/set-error-cause-in-javascript-node-js Node.js (16.9+). We aren't on that version just yet.

This is great, the design basically matches the exact API I have. And this should also mean that we can now expect a causation chain even for normal Error instances.

CMCDragonkai commented 2 years ago

For now I can't use it mainly due to:

  1. Nodejs is still on older version until nixpkgs fixes the node2nix
  2. https://github.com/adriengibrat/ts-custom-error/issues/38 needs to support the { cause: ... } options parameter for CustomError.
CMCDragonkai commented 2 years ago

Going to put this here for later, since I want to try out error integration first.

``` public toJSON( _key: string = '', options: { description?: boolean; message?: boolean, timestamp?: boolean; data?: boolean; cause?: boolean; stack?: boolean; } = {} ): { type: string; data: { description?: string; message?: string; timestamp?: Date, data?: POJO; cause?: T, stack?: string } } { options.description ??= true; options.message ??= true; options.timestamp ??= true; options.data ??= true; options.cause ??= true; options.stack ??= true; const data: POJO = {}; if (options.description) data.description = this.description; if (options.message) data.message = this.message; if (options.timestamp) data.timestamp = this.timestamp; if (options.data) data.data = this.data; if (options.cause) { // Propagate the options down the exception chain // but only if the cause is another AbstractError if (this.cause instanceof AbstractError) { data.cause = this.cause.toJSON('cause', options); } else { // Use `replacer` to further encode this object data.cause = this.cause; } } if (options.stack) data.stack = this.stack; return { type: this.name, data }; } const errors = { 'Error': Error, 'EvalError': EvalError, 'RangeError': RangeError, 'ReferenceError': ReferenceError, 'SyntaxError': SyntaxError, 'TypeError': TypeError, 'URIError': URIError }; function fromJSON(o: any) { if (typeof o.type !== 'string') { return; } if (typeof o.data !== 'object') { return; } let message; let timestamp; let timestampParsed; let stack; let e; switch (o.type) { case 'AbstractError': message = o.data.message ?? ''; timestampParsed = Date.parse(o.data.timestamp); if (!isNaN(timestampParsed)) { timestamp = new Date(timestampParsed); } e = new AbstractError( message, { timestamp, data: o.data.data, cause: o.data.cause } ); e.stack = o.data.stack; return e; case 'Error': case 'EvalError': case 'RangeError': case 'ReferenceError': case 'SyntaxError': case 'TypeError': case 'URIError': message = o.data.message ?? ''; if (typeof message !== 'string') { return; } stack = o.data.stack; if (stack != null && typeof stack !== 'string') { return; } e = new errors[o.type]( message, { cause: o.data.cause } ); e.stack = o.data.stack; return e; // case 'AggregateError': // return fromJSONAggregateError(o); default: return; } } ```

I'm not sure if js-errors should be providing a generic way of encoding/decoding the base errors. This seems like an application concern. And therefore even the toJSON may not entirely be relevant here.

Until we have figured out a good pattern for this. I suspect the toJSON can only be used at one level. Any further causes will be further specialised downwards. It's complicated by the fact that you may not even want to provide properties of the errors in different serialisation situations. So we shall do it in PK.

The stringify is top-down recursively (zooming-in), any object you return is further iterated over. The parse is bottom-up recursively, any object you return is further iterated in the higher context (zooming-out).

CMCDragonkai commented 2 years ago

Also test details...

``` test('json', () => { try { throw new AbstractError(); } catch (e) { const eObject = JSON.parse(JSON.stringify(e)); expect(eObject.name).toBe(AbstractError.name); expect(eObject.description).toBe(''); expect(eObject.data).toStrictEqual({}); expect(eObject.cause).toBeUndefined(); expect(eObject.stack).toBeDefined(); // Timestamp is stringified to ISO8601 expect(e.timestamp.toISOString()).toBe(eObject.timestamp); expect(new Date(eObject.timestamp).getTime()).toBe(e.timestamp.getTime()); } }); test('causation chain serialize', () => { const eOriginal = new AbstractError('original'); try { try { throw eOriginal; } catch (e) { const e_ = new AbstractError('wrapped', { cause: e as AbstractError }); expect(e_.cause).toBeInstanceOf(AbstractError); expect(e_.cause).toBe(eOriginal); expect(e_.cause.message).toBe('original'); throw e_; } } catch (e) { const eObject = JSON.parse(JSON.stringify(e)); expect(eObject.cause.name).toBe(eOriginal.name); expect(eObject.cause.description).toBe(''); expect(eObject.cause.message).toBe('original'); expect(eObject.message).toBe('wrapped'); expect(eObject.cause.data).toStrictEqual({}); expect(eObject.cause.cause).toBeUndefined(); expect(eObject.cause.stack).toBeDefined(); } }); ```