Descriptive error messages

kriskowal commented 3 years ago

Describe

XS error messages are terse. Would Moddable consider PRs that increase the expressivity of XS error messages?

Why do you think this feature would be useful?

If for example, the error message no argument were more descriptive, like BigInt.prototype.toString requires 'this' to be a BigInt, developers would be able to more directly address exceptions.

Describe alternatives you've considered

Error codes that refer to documentation for XS errors, e.g., XSBIGINTARG1.
Compiler flag to choose between terse and verbose error message texts.

phoddie commented 3 years ago

The error messages are part of the code and consequently use flash space. Projects do run up against flash size limits. Making the messages bigger would make that more likely.

I suppose an optimal solution would allow messages to be even smaller than today while providing the option to be more verbose. That perfect world would benefit constrained systems while improving the debug developer experience.

As I recall, a related topic came up some time ago with @erights regarding the use of error messages, which are non-normative, to infer the engine currently executing a script. Mark's thinking at the time, if I recall well, was to replace error strings with error numbers. Those have more potential to be consistent across engines (much hand-waving here) and could be resolved to human readable strings (more hand-waving).

kriskowal commented 3 years ago

In the spirit of hand waving, perhaps we could have a table of error numbers to messages, and for the message representation to be a rope of shared substrings compiled from the table of error numbers to error messages.

phoddie commented 3 years ago

Yes, something like that. I would kind of hope that the error messages don't get so big that using ropes is necessary (but I understand where such a mechanism would be valuable for some of your scenarios).

erights commented 3 years ago

@phoddie you remember correctly.

With error messages being prose text, we're never going to get to a deterministic JS spec. Instead, for each place in the ecmascript spec where it mandates that the platform throws an error, we need to decide on a non-prose error message that we'd be willing to advocate for a deterministic JS spec. The programming environment --- everything from REPL to console to debugger to IDE --- can then have the tables for rendering these into legible prose. Some of these simple messages will merely be unique codes like '#37'. Some of these should also contain data, so that their human rendering can use that data together with the code to render legible prose. Something like '#42: "constructor"' might render as 'Cannot assign to "constructor" due to the override mistake.'. Rather than numeric codes, I suspect it would be better to standardize on short identifiers.

I imagine internationalization tables have some conventions for turning codes+parameters into localized sentences. If there are existing conventions for doing that, we should consider them. Otherwise, I think we can painlessly roll our own.

Ideally, the prose tables and their use for rendering would be sufficiently outside the deterministic JS computation that they would not appear in snapshots, and that the same shared deterministic computation can be localized differently for different observers.

kriskowal commented 3 years ago

Just to be clear, my understanding from the above discussion is that you would not accept PRs that tripled the length of error messages in order to provide a better clue to developers, unless those PRs came with engineering to minimize the impact on the resulting flash text increase. A PR to change no to not, on the other hand, would be less controversial. Do I read the room right?

phoddie commented 3 years ago

Basically, yes. Because....

If we are going to make non-trivial changes to error messages, we should look at the whole problem. That way we don't implement partial solutions that need to be revisited soon.
Friendly error messages are developer friendly. ;) No argument. But, so is taking care so that their code can fit into their target device.

To that point, I understand Mark's goal of determinism and how error messages are problem. I'm not confident that error message can be normalized across engines. If that's true, maybe we should consider a different approach. The error messages are clearly not normative. As such, scripts should not be making decisions based on the messages. If that's strictly true (almost surely not, but...), then an engine configured to run in deterministic mode could suppress error messages entirely. Such an approach would be terrible for debugging, but perhaps there's an out-of-band solution there (for example, an internal slot accessible to debugging tools but not the script).

FWIW - I understand the use of "no" versus "not" is distracting but additional letter doesn't change much for developers. If we have a way to map in longer error messages, it would allow significantly more verbose messages, which is ultimately what would make a significant improvement.

kriskowal commented 3 years ago

It is an interesting point that the messages are non-normative, therefore code should not depend on them, therefore a valid program should be equally valid if all error messages are empty strings. With the SES-shim, we hide the stack from the program and reveal it to the console if it makes it that far. We also allow errors to be annotated, only allowing the console to reveal the annotations. We intend to treat errors as opaque objects for the purpose of currently hypothetical distributed debuggers, using out of band aggregation of the causal graph, stacks, and annotations). It would be equally valid to hide the message at a minor loss to ad hoc debugging.

This is a compelling long-term vision.

erights commented 3 years ago

I'm not confident that error message can be normalized across engines.

I don't imagine the mainstream engines (v8, SpiderMonkey, JSC) to ever implement Deterministic JS. Initially I am only hoping that together we can write down a Deterministic JS spec that Moddable is willing to implement in a future XS. The purpose of that spec is to write down what a third party would need to implement so that their execution is a lockstep deterministic replay of execution on any other engine conforming to the Deterministic JS spec. Any virtual machine to be run on a public permissionless blockchain should have an equivalently deterministic spec. EVM and ewasm do.

The relationship between Deterministic JS and standard EcmaScript should be that the former is a refinement of the latter. This means that any conforming implementation of Deterministic JS is also a conforming implementation of standard EcmaScript.

Another example: Deterministic JS must specify the sorting algorithm used by Array.prototype.sort, since it is observable. I expect we'll specify whatever XS currently does, if XS does something reasonable to specify. I don't imagine we'll ever get V8, SpiderMonkey, or JSC to agree to sort with that algorithm.

An open question is what standard org if any we take the Deterministic JS spec to. It may be tc39, tc53, or the new TC in formation for blockchain interoperability standards. We don't need to figure that out until we have a Deterministic JS spec, which will probably be many years away.

erights commented 3 years ago

If that's strictly true (almost surely not, but...), then an engine configured to run in deterministic mode could suppress error messages entirely. Such an approach would be terrible for debugging, but perhaps there's an out-of-band solution there (for example, an internal slot accessible to debugging tools but not the script).

That's exactly what E did. It was wonderful. The error/assert/console library I added to the SES shim takes a strong step in that direction, but without omitting the in-band error messages entirely. I agree this is something to consider for Deterministic XS. The error/assert/console approach to the out-of-band info seems good.

The same consideration applies to error stacks. The SES shim removes the stacks for in-band access from the error objects itself, but provides them out-of-band where our console can get them.

phoddie commented 3 years ago

This topic seems to be in a good place and have consensus on long term direction and goals.

From an XS perspective, I think possible near term steps are around error description mapping. Specifically mapping the current short descriptions to long descriptions to be friendly to developers by providing more information about the problem and mapping the current short descriptions to empty strings to be friendly to developers by providing more space for their code. ;) The SES/Deterministic JavaScript approach likely wants both mappings -- providing empty descriptions to scripts for determinism while proving long (or short) descriptions out-of-band for debugging.

erights commented 3 years ago

I agree that empty strings are best for Deterministic JS for in-band error messages. Likewise empty stacks, which the SES shim currently builds for itself starting with the API you currently implement (Error.prototype.string accessor that we delete, after grabbing its setter). For out-of-band info, the more the better. But a unique tag we can look up is fine rather than building more prose into the engine. The more significant issue is the data parameterizing that tag, such as the property name 'constructor' in the previous example. Our console would then use all these out-of-band channels to render errors with descriptive messages and stack that are useful for debugging.

phoddie commented 3 years ago

For out-of-band info, the more the better. But a unique tag we can look up is fine rather than building more prose into the engine.

Understood. I think the hard part is coming up with the mappings, including substitutions. We do something like that in our Piu UI framework for localization, but that approach probably isn't right for the engine itself. Once that is solved, we can sort out where (engine, debugger, etc) to apply the mapping.

erights commented 3 years ago

How direct a correspondence is there to the places in the XS implementation that decide to throw (and what to throw) vs the Ecma262 spec having a step that says that an error of a particular type must be thrown?

phoddie commented 3 years ago

I think I know where you are heading with this... I may have had a similar thought. I'm confident that the XS implementation is closer to the spec on that than most engines, but it would be some work to accurately characterize that.

phoddie commented 2 years ago

@erights – Part of what is ugly here is maintaining a mapping from the unique tag to the full message. Plus, it has long seemed impractical to get engines to agree on error messages. Maybe we can look at it differently? The primary goal is to eliminate the error messages as a source of entropy and as a way to distinguish engines. What if the error instance thrown has a message of the empty string and an internal slot with the real error message? XS can provide the host with a separate function to extract the actual message from instance and the Agoric runtime can hide that function from scripts it executes in Compartments. (Perhaps XS can limit hiding of the real error message to code executing in Compartments?)

This approach works with @kriskowal's excellent goal to provide more descriptive error messages. When the host extracts the real error message, it can apply a transformation through any convenient means, to provide additional detail.

kriskowal commented 2 years ago

This is also consistent with our notion that an error object under Hardened JavaScript should be opaque to intermediate call frames. The most secure position is somewhat developer hostile in environments that don’t automatically unbox errors for the developer, so our position is just shy of dogmatic in ses. We have gone to lengths to ensure that error details get automatically revealed to console through SES, but we have not done this for message.

erights commented 2 years ago

but we have not done this for message.

What @kriskowal says here is true for errors thrown from the engine. But for user defined errors thrown using our assert library, we actually put a lot of work into

distinguishing which portions of the error message should or should not be redacted.
Redacting those portions in error.message
Showing the entire unredacted error when showing the error on the console.

See https://github.com/endojs/endo/blob/master/packages/ses/src/error/README.md if you're curious. It is long and not really needed for this thread. But it's a fine read!

phoddie commented 2 years ago

@erights – partial redaction of error.message is quite something! (I did read the link on error logging in Endo. That was so interesting it took me down a (magic) wormhole into the extensive "survey of logging frameworks".)

@kriskowal – that all makes sense. I can't imagine that your SES implementation wraps every built-in that might throw to make the error message opaque. Or do you??

This topic has been pending for some time (longer than this issue). I think we are closing in on a workable solution. Are we close enough that it makes sense to explore an implementation in XS?

kriskowal commented 2 years ago

As you suspect, we do not wrap every built-in that might throw an error. We just provide assert features to make it easy to redact and reveal parts of messages.

On Tue, Oct 25, 2022 at 9:41 AM Peter Hoddie @.***> wrote:

@erights https://github.com/erights – partial redaction of error.message is quite something! (I did read the link on error logging in Endo. That was so interesting it took me down a (magic) wormhole into the extensive "survey of logging frameworks".)

@kriskowal https://github.com/kriskowal – that all makes sense. I can't imagine that your SES implementation wraps every built-in that might throw to make the error message opaque. Or do you??

This topic has been pending for some time (longer than this issue). I think we are closing in on a workable solution. Are we close enough that it makes sense to explore an implementation in XS?

— Reply to this email directly, view it on GitHub https://github.com/Moddable-OpenSource/moddable/issues/643#issuecomment-1290849061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOXBTTYRND57BLPESJLCDWFAEUBANCNFSM45FQL4SQ . You are receiving this because you were mentioned.Message ID: @.***>

erights commented 2 years ago

@erights – partial redaction of error.message is quite something! (I did read the link on error logging in Endo.

Thanks, glad you enjoyed it!

That was so interesting it took me down a (magic) wormhole into the extensive "survey of logging frameworks".)

IIRC mostly by @warner and @fudco . We still need to build a logging framework that addresses the dominant motivation of these --- logging potentially voluminous symbolic data for consumption by other tools, with only digested diagnostic info presented to humans. SwingSet's slogfiles do some of this, but specialized for SwingSet rather than as something available to regular vat code.

(None of which is actually relevant to the point of this thread though)

phoddie commented 1 year ago

I was hoping we might be able to make some progress on this based on the idea from October 20. That approach would deny untrusted code running under XS access to error messages, eliminating a source of non-determinism. If/when that becomes a priority, let's re-open this issue to revisit the details. Until then, I'm going to close this out.

Moddable-OpenSource / moddable

Descriptive error messages #643