SirWumpus / post4

Post4 is an indirect threaded Forth dialect written in C.
BSD 2-Clause "Simplified" License
4 stars 1 forks source link

Fix DO...IF LEAVE THEN...+LOOP et al. #23

Closed SirWumpus closed 1 month ago

SirWumpus commented 1 month ago

With the major overhaul from #18 have required some words be adjusted for the _interpret loop change, in particular the entire group of words related to DO..+LOOP.

SirWumpus commented 1 month ago

The above is not entirely correct it seems 😥 Nested loops : tw_do4 DO DO I J * . loop loop ; not working correctly. Probably an issue with using [: loop body ;] and _loop_inc_test (and similarly _loop_step_test) and return stack depth. Oh hum. I hate this family of words.

ruv commented 1 month ago

@SirWumpus, unloop exit is problematic too (see also my attempt to cope with double exit from a quotation).

In the previous implementation of LEAVE: https://github.com/SirWumpus/post4/blob/b4ea6162c2417b364b2461578fd00f2d7dc58deb/src/post4.p4#L1394-L1398

the problem could be fixed, if each item of the control flow stack is tagged (so you know its type).

Taking into account that the actual stack effect for LEAVE is ( C: do-sys1 i*{orig|dest} -- do-sys2 i*{orig|dest} ), you can temporary remove i*{orig|dest}, change/update do-sys, and place i*{orig|dest} back, as:

: LEAVE ( C: do-sys1 i*{orig|dest} -- do-sys2 i*{orig|dest} ) ( runtime: R: loop-sys -- )
  top-is-do-sys IF >R 1+ >R POSTPONE AHEAD R> R> EXIT THEN   size-of-orig N>R  RECURSE NR> DROP
; IMMEDIATE

Another option is to introduce a separate stack to implement compilation semantics for the loop components. NB: unresolved forward references can be used as items of a singly-linked list, and this list plays the role of a separate stack. Then only one additional variable is needed to store the head of this list.

Yet another option is to have the first address after LOOP (or +LOOP) run-time as part of the loop-sys, e.g. loop-sys = ( nest-sys n.limit n.index ). Then LEAVE is implemented as:

: LEAVE ( R: loop-sys nest-sys1 -- nest-sys2 )
  RDROP 2RDROP 
;
\ NB: not immediate
SirWumpus commented 1 month ago

@SirWumpus, unloop exit is problematic too

Yes. You solve for one case like IF ... LEAVE THEN only to have IF ... UNLOOP EXIT THEN cause you grief or visa versa.

(see also my attempt to cope with double exit from a quotation).

Trying to use quotations would allow : LEAVE 999 THROW ; or : LEAVE POSTPONE EXIT; IMMEDIATE to a point between the loop increment and test and UNLOOP cleanup, but breaks EXIT from the word within a loop. This last, eg. UNLOOP EXIT should in many ways be denied given section 6.1.1240 DO Anything already on the return stack becomes unavailable until the loop-control parameters are discarded.. Making exceptions for it makes implementing DO..IF LEAVE THEN..LOOP painful.

BUT that is me complaining about something I think broken in the standard and doesn't solve the immediate. EVERY TIME in the past that I've adjusted how REPL works, this family of words always cause me grief. Having the test suite now helps a lot, just wish the test suite in the standard was more complete.

I have considered some painful stack manipulation to collect LEAVE forward reference together, eg ( n*forw n dest DO_MARK ... other -- n'*forw n' dest DO_MARK ... other) then have LEAVE shift the stack to insert the reference, but that seems like way too much work.

In the previous implementation of LEAVE:

https://github.com/SirWumpus/post4/blob/b4ea6162c2417b364b2461578fd00f2d7dc58deb/src/post4.p4#L1394-L1398

the problem could be fixed, if each item of the control flow stack is tagged (so you know its type).

I've considered something like : LEAVE POSTPONE _branch ['] _leave_mark COMPILE, ; then in _loop_control during compilation have a while-loop scan back and replace _leave_mark with the correct relative offset. Slightly less complicated, but uncomfortable (like having an endoscope without a sedative (been there, done that)).

Taking into account that the actual stack effect for LEAVE is ( C: do-sys1 i*{orig|dest} -- do-sys2 i*{orig|dest} ), you can temporary remove i*{orig|dest}, change/update do-sys, and place i*{orig|dest} back, as:

: LEAVE ( C: do-sys1 i*{orig|dest} -- do-sys2 i*{orig|dest} ) ( runtime: R: loop-sys -- )
  top-is-do-sys IF >R 1+ >R POSTPONE AHEAD R> R> EXIT THEN   size-of-orig N>R  RECURSE NR> DROP
; IMMEDIATE

Make my eyes bleed looking at the that EXIT and RECURSE.

Another option is to introduce a separate stack to implement compilation semantics for the loop components. NB: unresolved forward references can be used as items of a singly-linked list, and this list plays the role of a separate stack. Then only one additional variable is needed to store the head of this list.

This might be the best option, though I'd probably just ALLOCATE a simple stack for do-sys forward references.

SIDEBAR Begs the questions which I could not quickly find an answer for in the Forth 2019 draft:

Yet another option is to have the first address after LOOP (or +LOOP) run-time as part of the loop-sys, e.g. loop-sys = ( nest-sys n.limit n.index ). Then LEAVE is implemented as:

: LEAVE ( R: loop-sys nest-sys1 -- nest-sys2 )
  RDROP 2RDROP 
;
\ NB: not immediate

I don't follow. I must be blind (or my brain has taken a vacation from the rest of my body and hung the "do not disturb" on my cerebral cortex - I think I can hear the sign flutter in the wind that blows between my ears).

ruv commented 1 month ago

Trying to use quotations would allow : LEAVE 999 THROW ;

This would not be compliant, because THROW on any non-zero stack parameter shall transfer control to the execution point just after the nearest user's CATCH (and CATCH that the system uses internally should not be detected):

t{  :noname  1 0 do  999 throw  loop 1 ; catch -> 999  }t
t{  :noname  1 0 do  leave      loop 1 ; catch -> 1 0  }t

What is the minimum number of nested DO-LOOP that must be supported, I assume at least two because of J. Exception code implies some upper limit -7 do-loops nested too deeply during execution.

Yes, but this limit is not directly connected with J, because DO-LOOP can be nested via intermediate words. In some implementations a separate stack is used for loop-sys parameters (i.e., an index and limit). This stack's depth increases on entering into a loop, and decreases on leaving the loop. This stack can overflow when any word in a loop body also contains do-loop (repeat that many times), or when the loop body takes control recursively. The standard does not specify the minimum capacity for this stack.

The section 4.1.2 Ambiguous conditions mentions "insufficient space for loop-control parameters" as an ambiguous condition.


or my brain has taken a vacation from the rest of my body and hung the "do not disturb" on my cerebral cortex - I think I can hear the sign flutter in the wind that blows between my ears

Your vivid metaphors make me laugh like a horse every time 🤣🤣🤣

SirWumpus commented 1 month ago

t{ :noname 1 0 do 999 throw loop 1 ; catch -> 999 }t

Wouldn't that be ... -> 1 0 999 }t?

t{ :noname 1 0 do leave loop 1 ; catch -> 1 0 }t

And ... -> 1 0 0 }t?

or my brain has taken a vacation from the rest of my body and hung the "do not disturb" on my cerebral cortex - I think I can hear the sign flutter in the wind that blows between my ears

Your vivid metaphors make me laugh like a horse every time 🤣🤣🤣

Well guess I'm not boring.

ruv commented 1 month ago

t{ :noname 1 0 do 999 throw loop 1 ; catch -> 999 }t

Wouldn't that be ... -> 1 0 999 }t?

No. Because catch consumes an xt and then stores the stack depth. In this case the stored stack depth is 0. On exception the stack depth is restored to 0, and then an ior code is placed to the stack.

If we factor out the pair ( 1 0 ) out of the definition and makes the definition to consume this pair, then the stored stack depth will be 2, and when the stack depth is restored, two unspecified items will appear on the stack under an ior:

1 0  :noname  do  999 throw  loop 1 ; catch  ( x x 999 )

The particular value of this ( x x ) depends on implementation and run-time conditions.