SirWumpus / post4

Post4 is an indirect threaded Forth dialect written in C.
BSD 2-Clause "Simplified" License
4 stars 1 forks source link

Should stop interpreting the input buffer after any error that causes `throw`. #8

Closed SirWumpus closed 1 month ago

SirWumpus commented 2 months ago

A Forth-2019 system shall not continue interpreting the input buffer after any error that causes throw. If the exception is not caught by user's catch, the system shall either perform quit (and read the new line from the user input device), or terminate with non-zero exit status (if no user input device available, or due to command line options).

I would interpret the case when the standard input (stdin) is a pipe as absent of the user input device (a keyboard).

Originally posted by @ruv in https://github.com/SirWumpus/post4/issues/7#issuecomment-2271565252

SirWumpus commented 2 months ago

@ruv Would you be able to provide chapter & verse from Forth 200x Draft 19.1? I've been looking for it just now and it doesn't jump out at me.

ruv commented 2 months ago

Would you be able to provide chapter & verse from Forth 200x Draft 19.1?

Yes, certainly.

3.4 The Forth text interpreter

d) If unsuccessful, throw an -13 (undefined word) exception.

9.6.1.2275 THROW

[...] If the top of the stack is non zero and there is no exception frame on the exception stack, the behavior is as follows [...] Subsequently, the system shall perform the function of 6.1.0670 ABORT

6.1.0670 ABORT

Empty the data stack and perform the function of QUIT

6.1.2050 QUIT

[...] make the user input device the input source, and enter interpretation state. Do not display a message. Repeat the following: – Accept a line from the input source [...]

SirWumpus commented 2 months ago

3.4 The Forth text interpreter

d) If unsuccessful, throw an -13 (undefined word) exception.

Yep.

9.6.1.2275 THROW

[...] If the top of the stack is non zero and there is no exception frame on the exception stack, the behavior is as follows [...] Subsequently, the system shall perform the function of 6.1.0670 ABORT

Yep.

6.1.0670 ABORT

Empty the data stack and perform the function of QUIT

Yep.

6.1.2050 QUIT

[...] make the user input device the input source, and enter interpretation state. Do not display a message. Repeat the following: – Accept a line from the input source

Think this is the part I missed.

@ruv As point of clarification, the standard would be clearer if it replaced:

- Accept a line from the input source into the input buffer, set >IN to zero, and 
interpret.

With

- Flush the current input buffer, set `>IN` to zero, accept a line from the input
source into the input buffer, and interpret.
ruv commented 2 months ago

With

  • Flush the current input buffer, set >IN to zero, accept a line from the input source into the input buffer, and interpret.

And what does "to flush" mean in the standard? NB: there is the word flush, so it will probably be confusing.

ruv commented 2 months ago

6.1.0670 ABORT

Empty the data stack and perform the function of QUIT

Yep.

The part "Empty the data stack" is still missed.

NB: this part only takes place if there is no exception frame.

SirWumpus commented 2 months ago

And what does "to flush" mean in the standard? NB: there is the word flush, so it will probably be confusing.

Maybe "flush" is wrong word. How about "discard". The point I'm driving for is that there should be a stronger statement that current input should be dropped/ discarded/ flushed/ snubbed/ nuked before "accepting" new input.

SirWumpus commented 2 months ago

6.1.0670 ABORT

Empty the data stack and perform the function of QUIT

Yep.

The part "Empty the data stack" is still missed.

NB: this part only takes place if there is no exception frame.

In p4Repl() I made an explicit divergence from the standard. I was constantly manually setting up data on the stack and trying to diagnose code during development and if I hit an undefined word it would reset everything. Drove me nuts. Instead I slotted undefined with QUIT behaviour, ie. just reset the Return Stack.

    /* An earlier version treated most exceptions like ABORT
     * which would empty the stack, which is annoying when
     * interactive as this could upset work in progress.
     *
     * An undefined word does not need to behave like ABORT,
     * so the stacks can remain untouched. See Forth 200x
     * Draft 19.1 section 3.4 d.
     */
    THROW(P4_THROW_UNDEFINED);

See lines 1573 to 1612. This ordering probably results in non-standard behaviour, but I found that the standard was confusing with the regards to some exception behaviour and was looking at something sane and functional.

I'll review it further, but IMO the standard draft needs to clearly state the behaviour of all exceptions. ABORT and QUIT are clear, but the other less so. If I'm having trouble with the draft, then I'm sure other developers are too.

@ruv Is Forth-2019 a standard now or still draft status?

ruv commented 2 months ago

current input should be dropped/ discarded/ flushed/ snubbed/ nuked before "accepting" new input.

It is still unclear what this means and why a program should be aware of that, regardless of implementation details of a particular Forth system.

Is some implementations the input buffer for user's input device is a static memory range, and the system don't need to do anything with this buffer before accepting the new line from the input source.

ruv commented 2 months ago

I was constantly manually setting up data on the stack and trying to diagnose code during development and if I hit an undefined word it would reset everything. Drove me nuts.

Got it. I would suggest adding a command line option to enable this non-standard behavior. Or vice versa, add an option that turns on the standard behavior.

Is Forth-2019 a standard now or still draft status?

It's a draft. It's only due to editorial issues, as I understand it.

SirWumpus commented 2 months ago

Fixing -13 undefined word behaviour following above discussion.

ruv commented 2 months ago

IMO the standard draft needs to clearly state the behaviour of all exceptions. ABORT and QUIT are clear, but the other less so. If I'm having trouble with the draft, then I'm sure other developers are too.

I agree. I expect it will be done in some of the next versions/snapshots.

SirWumpus commented 2 months ago

... adding a command line option to enable this non-standard behavior.

I'll probably do this as some custom flag variable, eg TRUE DEBUG ! (DEBUG might not be best name) in the future. It avoids need for command-line option and more in the Forth way of thinking (I guess).

I agree. I expect it will be done in some of the next versions/snapshots.

It been sometime since draft 19-1 was published, even with COVID, I was hoping to see more updates, even if only editorial changes, these past five years. It would be nice to see what edits have been made since 19-1.

SirWumpus commented 2 months ago

@ruv BTW the ABORT and QUIT make no mention how the Float Stack should be handled. I assume since that what applies to the Data Stack should apply equally to the Float Stack. Again some wording to this affect in the draft for ABORT and QUIT to remind developers would be good (unless my assumption is wrong).

ruv commented 2 months ago

BTW the ABORT and QUIT make no mention how the Float Stack should be handled.

Typically, ABORT empties the floating-point stack and control-flow stack as well as the data stack.

So, I think, it was missed in Forth-94. Because, when these stack are united, it is difficult to empty one and not empty another. It should be fixed in some future snapshot.

SirWumpus commented 2 months ago

It should be fixed in some future snapshot.

That's what they all say, then someone finds a new shiny and forgets.

ruv commented 2 months ago

That's what they all say, then someone finds a new shiny and forgets.

I have already filled a Request for clarification. Open requests are part of the to-do list.

There are a lot of things to do but few hands in the Forth standardization process.

SirWumpus commented 2 months ago

I have already filled a Request for clarification.

Nice. I often thought many of my comments concerning the standard draft disappear. Thanks.

ruv commented 2 months ago

Fixing -13 undefined word behaviour following above discussion.

This behavior should not be specific to the only error -13. If there is no user's catch (which formally means, there is no exception frame), any exception causes the function of abort, which empties the data stack, floating-point stack, and control flow stack. This behavior can be interpreted as the system's default exception handler behavior inside quit.

See also my implementation example for quit.

SirWumpus commented 2 months ago

... but few hands in the Forth standardization process.

Pity I don't know Forth as deeply and its history otherwise I'd offer to help.

any exception causes the function of ABORT

What about the case of -56 QUIT exception code? Does it behave like ABORT or QUIT. If you throw -56, it seems counter intuitive that it behaves like ABORT.

SirWumpus commented 2 months ago

Also what about user exception codes, eg. code <= -4096 or 0 < code. Do those also behave like ABORT when no user CATCH?

SirWumpus commented 2 months ago

What about the case of -56 QUIT exception code? Does it behave like ABORT or QUIT. If you throw -56, it seems counter intuitive that it behaves like ABORT.

I think I've found my answer in a previous Discussion:

On the other hand, the following program should print -56 123 on any standard system: :noname 123 [: 456 -56 throw ;] catch . . ; execute

-56 throw behaves like QUIT, not ABORT, the data stack remains.

Not sure your example implementation accounts for -56 QUIT.

ruv commented 2 months ago

the following program should print -56 123 on any standard system:

:noname 123 [: 456 -56 throw ;] catch . . ;  execute

-56 throw behaves like QUIT, not ABORT, the data stack remains.

My example shows that there are nothing special with the throw code -56. With other throw codes the behavior is the same:

t{ 123 [: 456 -1 throw ;] catch -> 123 -1 }t
t{ 123 [: 456 -2 throw ;] catch -> 123 -2 }t
t{ 123 [: 456 -50100 throw ;] catch -> 123 -50100 }t
t{ 123 [: 456 here throw ;] catch -> 123 here }t

The number 456 is removed from the stack when the stack depth is restored by throw. The number 123 remains untouched on the stack (because it's not consumed by the called code).

ruv commented 2 months ago

What about the case of -56 QUIT exception code? Does it behave like ABORT or QUIT. If you throw -56, it seems counter intuitive that it behaves like ABORT.

When the operation systems sends SIGQUIT to a Forth system process, the Forth system can convert this signal to an exception, and use the code -56 for this case. The Forth program can then analyze the throw code and take different actions depending on the throw code, as @MitchBradley showed — that is correct and standard compliant.

Also, a program itself can throw the code -56 (as -56 throw) in one place and analyze this code in another place.

But it is incorrect to define quit as : quit -56 throw ;

ruv commented 2 months ago

Also what about user exception codes, eg. code <= -4096 or 0 < code. Do those also behave like ABORT when no user CATCH?

Yes, they all behaves the same. If there is not user's CATCH, and THROW was called with a non-zero code, the Forth system shall perform the function of 6.1.0670 ABORT. The difference may only be in the messages that the system displays.

ruv commented 2 months ago

If the Forth system provides the Exception word set, then abort shall be equivalent to -1 throw.

Test cases:

t{ 123 :noname 456 -1 throw ; catch -> 123 -1 }t
t{ 123 :noname 456 abort ; catch -> 123 -1 }t

t{ 123 :noname 456 -2 throw ; catch -> 123 -2 }t
t{ 123 :noname 456 abort" test error" ; catch -> 123 -2 }t
SirWumpus commented 2 months ago

What about the case of -56 QUIT exception code? Does it behave like ABORT or QUIT. If you throw -56, it seems counter intuitive that it behaves like ABORT.

When the operation systems sends SIGQUIT to a Forth system process, the Forth system can convert this signal to an exception, and use the code -56 for this case.

Also, a program itself can throw the code -56 (as -56 throw) in one place and analyze this code in another place.

But it is incorrect to define quit as : quit -56 throw ;

Commentary

@ruv As you know I've struggled separating these concepts, in part because of confusing naming and translating knowledge and parallels from other languages and OSes into Forth. While QUIT is a historical Forth function, I don't think new developers realise that QUIT is the read-eval-print-loop (REPL) found in modern scripting languages (Perl, Ruby, Python) until late; the name is misleading and should be explained better.

Similarly the description of the exception codes and how they are used is vague. -1 ABORT, -2 ABORT", and -56 QUIT simply refer to the words, but not how they are really used throughout, especially WRT THROW and CATCH. The information is scattered and you have to read and retain everything in order to create a mental picture.

Then there is how to handle host signals (assuming POSIX here) are handled WRT ABORT, QUIT, user interrupt, etc. The more common SIGINT (^C) I would associate with QUIT but appears to behave like ABORT and little used SIGQUIT (^\) one would expect be handled like QUIT if used, but these things are never mentioned in the standard so an implementer is left to slog through it and make choices.

I think the standards Rationale section for these things needs more common language (less formal) text explaining more of the history, concepts, and distinctions for new developers to Forth that are using the standard as reference documentation; there are so many subtle "gotcha" cases that can trip up developers. And while example code fragments are helpful, they should be supported by additional explanatory text.

I've been interested in Forth on and off for 42 years, but have never had opportunity to dive into in more depth until now, in part because of the draft standard, but it is still extremely hard read in places compared to other standards like IETF RFCs, POSIX, ISO C, Javascript (eww), Erlang, etc.

I know the number of participants in the standard is small, but they have so much historical knowledge that should be documented, more examples, more tests. And the sources of information is scattered across GitHub, Forth Standard, Forth draft, and comp.lang.forth (which expire overtime), and I bet lots of beer stained paper napkins.

I thank you for your knowledge and time and hope some of my struggles help improve the standard draft.

MitchBradley commented 2 months ago

I'll try to explain. First of all, when the standard was being developed, Linux barely existed and was certainly not widely used. Most Posix users were on industrial machines like Suns, AIX machines, and a few others that had little intersection with the Forth community. I probably had more Unix expertise than the rest of the standards committee combined. The predominant OSs that Forth people were using were Windows 3 and the non-Posix-based Mac OS. So the mapping from Posix signals to exceptions to throw codes was just not on anyone's radar. I considered it briefly, but had no energy left to work out all the details and to lobby for it. As I have explained before, CATCH and THROW were last-minute additions to the standard and we were fighting to get the standard approved in the face of seemingly-never-ending quibbles from yak-shavers.

The Windows crowd was still worrying about artifacts of the x86 hybrid 16/32-bit architecture like far pointers and other painful warts. The Mac people were similarly bogged down in 32k segment considerations and other ancient Mac trickery. Clean operating systems existed - various Unixes, OS/2, VMS, ... - but would not become widespread on personal machines for several years.

The Forth community, at the time, was mired in the past. Arguably, that has not entirely changed. You might be amazed how hard I had to work to convince the committee that dynamic memory allocation is important.

So, if the rationale is inadequate, keep in mind that much of it was written by a single person (me) who was, at the time, also developing and implementing another standard (IEEE 1275-1994 Open Firmware).

SirWumpus commented 2 months ago

@MitchBradley Thank you for that. Most interesting. Sorry you carried the burden writing most of Rationale. Could not have been easy. Technical writing at this level of detail must make your head throb.

Sorry if it appears I'm fussing over minutia, Forth has so many nuances I keep getting caught on, typically because I don't know the history and the draft misses small important stuff or doesn't link them easily. I just want to see the best for the standard before it finalises, so I raise questions from a new implementers view point.

ruv commented 2 months ago

While QUIT is a historical Forth function, I don't think new developers realise that QUIT is the read-eval-print-loop (REPL)

I think, a word REPL can be added. And QUIT can be left as a synonym. As usual, somebody needs to prepare a formal proposal.

-56 QUIT simply refer to the words

Yes, it's a problem in the Table 9.1: THROW code assignments.

One problem was that the Exception word set was optional. So, ABORT and QUIT should have been specified in the way that is correct whether this word set is provided or not. It makes things more complex. In the next iteration we can make this simple since the Exception word set is mandatory now.

some of my struggles help improve the standard draft

Yes, sure! Thank you for for your questions and comments! You may want to prepare some formal proposal 🤗

SirWumpus commented 2 months ago

You may want to prepare some formal proposal.

Not sure I'm sufficiently Forth savvy to put forward formal proposals. Though Post4 might be considered a "clean-room" implementation from the standard draft that highlights what a developer experiences when they don't have the benefit of Forth history.

SirWumpus commented 2 months ago

STRUGGLE ( caddr u -- caddr u bool ) Supply a transient cogent string which will be inwardly digested, cogitated, dissected, analysed, tested, reviewed, pub discussed, and not necessarily limited to these activities. Return bool TRUE if the string was accepted; FALSE if rejected. In both cases a new transient string is returned with proposed alternative wording.

ruv commented 2 months ago

6.1.2050 QUIT

[...] make the user input device the input source, and enter interpretation state. Do not display a message. Repeat the following: – Accept a line from the input source [...]

The part make the user input device the input source is still missed. If quit is called during a file including, the system continues to read the same file.


A similar issue with 9.6.1.2275 THROW. The following action is missed:

See also my comment Input source after THROW (at forth-standard.org).

ruv commented 2 months ago

It seems, a correct include-file and quit far simpler implement in Forth than in C.

SirWumpus commented 2 months ago

It seems, a correct include-file and quit far simpler implement in Forth than in C.

You're very comfortable with Forth, so you probably see things I don't.

I suspect my design choice to use setjmp/longjmp to catch signals like SIGINT maybe be an issue. However I believe it workable in the end. The code now is so close.

I have considered (and would like to) rewrite p4Repl()/QUIT with a QUIT written in Forth, but I have design concerns as to how best to bootstrap from C into Forth, define QUIT and all the words needed to get there. Many of your past examples from assorted sources seem circular with respect to the required words.

I had planned this as some future upgrade much much later.

SirWumpus commented 1 month ago

Closing as issues #18 and #20 continue with some of the respectives concerning QUIT, CATCH, and THROW.