Improving error messages with Janet

pyrmont commented 3 years ago

I'm interested in helping improve the error messages when using Conjure with Janet. My understanding is that this requires changes to: (1) Janet, (2) Janet's netrepl and (3) Conjure. A change to Janet has already been made (explained in more detail below).

Problem

The error messages sent by the netrepl server use the line and column numbers reported by a server session's parser object. Since each line that the parser parses increments the parser's line number, this means the line number in an error message is really the total number of lines that have been sent to the server. Needless to say, this is not especially helpful.

Example

Here's a simple example. I have a buffer that consists of the following:

(+ 1 x)

When you evaluate this for the first time, the error is compile error: unknown symbol x on line 1, column 1 while compiling repl. If you evaluate it again, the error is compile error: unknown symbol x on line 2, column 1 while compiling repl. The line number will increment each time the buffer is evaluated.

Solution

I first raised this problem in janet-lang/janet#531 and the solution proposed on the Janet-side was to support the manual setting of a parser's line and column number. That change was made in janet-lang/janet#532. The next step is to make changes to Conjure and netrepl to take advantage of this.

I had a look at netrepl.fnl and my naïve solution is that the netrepl message protocol needs to support the initial line number (and column number) being sent with the message.

@Olical, since I figure you're probably the person most familiar in the entire world with how netrepl's message protocol works, does this sound right? Do you have an opinion on a particular way for this to be done? Following discussion here, I'm happy to actually make the PR necessary to Janet's spork module.

Bonus

I think it would be neat if the name of the buffer was also used in the error message (rather than the current repl). Again, I presume that you could send this with the message data. Would love to hear if you have any ideas on this, too.

cmaughan commented 3 years ago

Just to say I came to this repo to search for this exact problem. As a beginner, not having error messages at the right line is tough. I think the problem appears to be worse than this though. Once I get an error in my output log, it never goes away; even if I evaluate a different/valid expression. It is as if the whole file is being parsed after I have evaluated the valid sub expression.

For example: (print (+ 2 2))

will print 4 correctly, followed by my previous 'parse error' messages.....

cmaughan commented 3 years ago

On further inspection I suspect my evaluated expression is evaluated first, but then the rest of the file is parsed for invalid expressions (i.e. mismatched brackets)

Olical commented 3 years ago

This is a great improvement, thank you for taking the initiative and putting in the research! Since Conjure is a jack of all trades and I'm a Clojure fanatic, the other languages are at risk of missing good things because I don't have the deeper knowledge / day to day use.

Issues like this where you share your insight and research into deeper things to leverage my work on the general tooling are great. We meet in the middle, you bring the idea, I bring the Fennel 😄

So the netrepl protocol (implementation) is really just encoding the length of the message and the message itself, so the server knows how many bytes are coming ahead of time. Not much to it really.

If we want to set the line number, column and buffer name (something I would do inside nREPL messages for Clojure) we'd need to send some code for eval such as (set-line! 5) or whatever. So is there code I can execute that will set this information? If so, I can alter the Janet client to send this information before your evaluations occur.

I have this info packed into every client eval by default already, it's part of the client / eval design, it's just up to the client to send that info across the transport in some way. Right now I drop that info on the floor because I didn't think I had a use for it in Janet.

@cmaughan This is a different but similar issue I've seen discussed on the Conjure gitter. Basically I want an "early fail" mode for Janet, which exists when you perform a require, but only then. I will probably enable this "early fail" for requires but it won't work when evaluating your whole buffer for example, you'll still get this issue where an early fail yields weird results further down.

I wonder if there's a mode we can switch in Janet (or add to Janet) that means "on error, cancel eval and revert the eval state", like a REPL mode rather than a scripting mode?

Olical commented 3 years ago

This setting of line, column, file name and error handling procedure are all dependant on Janet supporting them via some sort of function call though. If they exist, this will be easy.

If we need to create our own eval context or some kind (i.e., not rely on the default REPL eval for some reason) this gets quite a bit trickier.

pyrmont commented 3 years ago

@Olical Sorry for the delayed response.

My understanding of the state of things is as follows:

the current netrepl server does not maintain its own parser object (source code);
the current netrepl message protocol treats the 'body' of all messages as 'code' (@Olical comment); and
a parser's line and column numbers can be set using parser/where (source code).

This implies that to allow a netrepl client to set the line/column number:

the netrepl server must create its own parser object that it can use as an argument to parser/where;
the netrepl message protocol must be extended to allow for a message to contain more than 'code'.

The extension seems like a significant change to the way netrepl works. My cursory review of the nREPL message protocol is that it is at least in part designed to allow different types of messages to be sent to the server. Rather than create another protocol, do you think the best solution is to fork the netrepl server and use it to create an nREPL-compatible server for Janet? This fork could be merged back into the spork module if it was deemed a sufficient improvement over the current implementation, of course.

pyrmont commented 3 years ago

One thing I wonder about in particular is whether using bencode is worthwhile. I have to imagine that the vast majority of interactions between an nREPL client and server are occurring on the same machine and that, as a result, all the bencode is doing is adding overhead (both in terms of computation and, perhaps more importantly, memory). Even in situations where you are talking to a remote computer I'm not sure what the compression ratio is like for bencode-encoded messages but I'd guess not that much better than simply sending the message as plaintext.

And so part of me thinks it would be better to create a bespoke protocol that's similar to nREPL but (a) isn't bencode-encoded; and (b) doesn't promise compatibility with the full range of operations (some of which I'm not sure make sense in Janet). Of course, merely typing that makes me want to paste the XKCD comic regarding standards as a response to myself so I'm not sure.

Any thoughts?

Olical commented 3 years ago

Hmm interesting! I need to have a poke around but I have a feeling (probably wrong but still) that we do have access to some global parser... or I saw some example when starting up a spork repl that allowed you to pass in a parser? There could be something useful there!

And I'm not sure if you need more than code, like we can send some code that contains the instructions to update the parser, that's not a huge deal imo. The alternative is of course to have an nREPL like layer in the middle that let you specify the line, column and file the eval is taking place in, like Clojure's nREPL.

Also bencode isn't used for any efficiency reasons as far as I know, it's just a neutral middle ground that all languages can easily support, whereas EDN is a lot harder to write a parser from scratch for. Bencode is actually reaaaaly simple, I quite like it as a medium for exchanging objects, lists and primitives. I think bencode is totally fine as a format to encode and decode messages with really, I'd rather use that than the custom one netrepl uses but that's also fine. (I think netrepl's implementation is one of a kind?)

I think the best approach with the least impact surface area would be to have some global hooks of some kind exposed in a spork netrepl that let us modify the parser state from code sent over the wire, that way the interface doesn't change, spork gets a small modification (if it can't do this already?) and Conjure requires a very small change, it just prefixes every eval with a little block of code that updates the parser state before the users code is executed.

I do need to have a play about in the spork netrepl though really, to check if this is possible already, I vaguely remember seeing things about "if yo don't provide a parser while starting the netrepl it'll start one itself", I just wonder if that one is accessible somehow.

pyrmont commented 3 years ago

It's been quite a while but I finally got around to doing more on this and I think manipulating the line number will become possible very soon.

First, though, I have two observations about the way netrepl works:

There is no way at present to access the parser object from either the netrepl code or the run-context function. (The run-context function is the function which sets up the REPL environment where in-band messages are evaluated.)
The netrepl protocol provides support for out-of-band messages (prepended with 0xFF) but these messages cannot manipulate the parser at present. The reason for this is that the current netrepl server implementation does not create a parser object and put it in an environment that out-of-band messages can access.

The good news is that I have submitted a PR janet-lang/spork#33 that would put the parser object in a location where it can be manipulated by out-of-band messages. The PR may not be accepted in its current form and so the specific semantics are subject to change. That said, if you use my fork now you should be able to set the line number to 100 by sending a message like so:

“\xFF(parser/where (dyn :parser) 100)”

Hopefully that all makes sense but if anything is unclear, please let me know! Progress is being made!

pyrmont commented 3 years ago

One thing I didn't mention was updating the file location. Unfortunately, that's not something that can be edited at the moment and requires a change to run-context. My suggestion is that we wait until the Spork PR is merged before considering how best to approach this.

Olical commented 3 years ago

That sounds great to me! Thank you so much for doing the deep Janet context dependant work and pushing things forward. I'll definitely be able to do the final plugging in on my end when the tools are there, but you're saving me so much time in trying to work out what even needs to happen and why within the language / protocol itself.

pyrmont commented 3 years ago

@Olical #janet-lang/spork#33 has been merged without any changes. Conjure should be able to manipulate the line (and, if it wants, column) number by sending a message that:

begins with the 0xFF byte;
follows that with a call to the parser/where function (API).

The parser object is accessible using (dyn :parser).

pyrmont commented 3 years ago

As noted in the earlier message, there is not a way to currently change the file location. The file location is not information that is maintained within the parser. Instead, this value is set using the :source argument to the run-context function (API).

I can think of a couple of ways to allow the source to be update, all of which require changes to Janet itself to implement. I'll open an issue in Janet's main repo for further discussion there about the preferred approach.

pyrmont commented 3 years ago

I thought a bit more about the changes that could be made and I think the simplest approach is to support a new out-of-band signal.

At present, spork's netrepl supports messages prefixed with either 0xFF or 0xFE. Message prefixed with 0xFF were discussed above but 0xFE-prefixed messages can also be sent. These messages cause the message's value to be returned to run-context as a keyword. The only keyword that is currently supported is :cancel (you can see how it works here).

My suggestion would be to permit a new signal prefixed with 0xFD. Messages sent with this prefix would consist of two values that would be transformed into a tuple. For example, the message "\xFDsource \"source location\"" would be returned to run-context as [:source "source location"].

This approach requires a change be made to the way the run-context function works. A PR to make the necessary change has been submitted as janet-lang/janet#642. If that is accepted, I'll make the necessary changes to the netrepl code and provide an update again when that's done.

pyrmont commented 3 years ago

@Olical I'm pleased to report that the pieces are now in place to improve the error messages when using Janet's netrepl 🎉 This requires Janet v1.15.3 and the latest master of spork/netrepl.

I'll explain below how the Janet-side of things now works. These techniques rely on messages being prepended with a (technically UTF-8 invalid) byte. These bytes are represented below using the format \x<hexadecimal>.

Setting the line (and column) number

The line number of the parser used by the netrepl server can be set by sending the following message:

\xFF(parser/where (dyn :parser) 100)

The column number can also be sent:

\xFF(parser/where (dyn :parser) 100 50)

Setting the source file location

The source file location can be set by sending the following message:

\xFEsource "/path/to/file.janet"

Note that the source location must be sent using Janet's string literal syntax (i.e. either wrapped with " or `). If you're using quote-delimited string syntax, you can escape characters using \. If you're using tilde-delimited syntax, all characters within the tildes are treated literally. If you need to include one or more tildes in a tilde-delimited string, you'll need the number of delimiting tildes to be one more (i.e. ``This includes the ` character``).

Gotchas

Please note that messages to update the line (and column) are prepended with the byte 0xFF and are S-expressions. However, messages that update the source file location are merely keyword value sequences prepended with the byte 0xFE.

I realise that, ideally, it would have been better if the API for manipulating the line/column and the source location were most consistent. While this theoretically could be made possible by updating run-context to support a message in the form \xFEcolumn 100, that's not supported at the moment.

Hopefully that makes sense but if you have any questions (or find that the implementation is buggy!), please let me know!

Olical commented 3 years ago

Amazing! Well I'll try my best to hook these in some time soon but I'm absolutely negative on free time, hopefully I'll get a little bit soon but I have a mountain of issues to work through. I'll try my best, but if you beat me to it I'm totally happy to review a PR :slightly_smiling_face:

pyrmont commented 2 years ago

Confirming this has been fixed by #265 🎉

(The reason this issue wasn't automatically closed is because the PR was merged into the develop branch and, it seems, automatic linking of issues with PRs only works on the default branch.)

Olical / conjure