OpenFn / kit

The bits & pieces that make OpenFn work. (diagrammer, cli, compiler, runtime, runtime manager, logger, etc.)
8 stars 12 forks source link

Runtime: throw a standard, serializable error object #726

Open josephjclark opened 4 days ago

josephjclark commented 4 days ago

We have a bunch of bad error handling cases #720 #612 (and maybe more) where error information is lost/hidden from the user. This affects the CLI and worker and I think affects them both differently.

Part of the problem is that a) err objects don't serialize and b) the error will be passed through different processes in both the CLI and worker.

I think the best way to solve this is to trap any errors coming out of the runtime, serialize them into a stable standard object format, and then throw them on. Then the CLI and worker can do what they want with them

The format will be like:

type ErrorObject = {
   type: string // the error short name, ie TypeError, HTTPError, Not_Found. the error
   message: string; // human readable message for the user
   stack?: string // stacktrace to show where the error came from. Often not useful.
   details: {  // any debug/logging details associated with the error
      // anything goes
   },

  position?: {
     // later we'll add line number and maybe offset, sourcemapped to the original code
  }
}

The runtime should catch and handle different types:

What about stack traces? They can be useful for errors coming out of an adaptor, and later will be useful for errors in job code. But runtime errors are not a useful stack trace. So I guess the runtime itself needs to decide what to do about stack traces.

As a result of this work, I want a nice clear standard for how adaptors handling errors. I think it's literally: throw the error (or let the error be thrown), and let the runtime handle the logging. Don't console.error the error. Maybe add logging if you can add clarity. Maybe consider writing extra properties to the error object. But basically trust the runtime to log the error nicely.

josephjclark commented 4 days ago

Probably closes https://github.com/OpenFn/adaptors/issues/624

josephjclark commented 4 days ago

Alright, well I've got a nice means of serializing Errors into objects.

But this issue is going to go deep :( The way errors are caught, processed, re-thrown, re-caught, and then eventually logged... all a bit icky. With the result that the serializable errors are getting wrapped up and rethrown elsewhere in a non serializable way.

It's too heavy for a Friday afternoon so I'm taking some notes and going to think about it over the weekend, then aggressively simplify this on Monday.

josephjclark commented 1 day ago

Test cases:

josephjclark commented 1 day ago

Support throwing an array as well

josephjclark commented 1 day ago

Consider increasing the depth of logging objects in the CLI. And maybe adding options for this. Because in satusehat the default isn't enough when logging an error object to show everything

I am in a bit of a pickle also about how the logger serializes all objects into json. It can be ugly. How do we know when to stringify and when to log as an object? What does the worker need?

josephjclark commented 23 hours ago

I'm struggling with the design here.

The problem is that any kind of error (including not actual error objects) can be thrown from the runtime itself, job code, adaptor code, or anywhere. They all need to be caught and they all need to be serialized into objects safely by the runtime.

My current strategy of lots of different Error types isn't really working IMHO. It needs ripping out, which is expensive because it's a well-tested area.

What I want to do instead is have a single RuntimeError class with toString() and toJSON() functions. It has a constructor like this:

new Error(message, source, severity, details, stack, position)

And serializes like this:

   source: runtime|adaptor|job|compiler|cli|worker
   position?: { line, offset } // sourcemapped position, when we have it, if relevant
   severity: crash|fail|kill
   message: "" // human readable summary
   details: { whatever }
 }

There isn't a Type or Name for this error - that's always been a point of confusion and doesn't help. And we have stupid names right now like RuntimeCrashError. So we just have a source and message.

Details is where we serialize (safely) all the keys of the original underlying error. It might well have a Type or Name, depending on the error.

When logging the error, the logger will call its toString() function. When emitting it out of the runtime process, we use toJSON(). So in both uses we should be able to guarantee good output.

Catching and creating these errors needs to happen at strategic points in the runtime itself, much like now. Crash and Kill errors need to bubble up and cause a throw, but fails will allow the workflow to finish.