TG9541 / stm8ef

STM8 eForth - a user friendly Forth for simple µCs with docs
https://github.com/TG9541/stm8ef/wiki
Other
314 stars 66 forks source link

Need C, and , to just compile the stack byte and cell respectively #27

Closed RigTig closed 7 years ago

RigTig commented 7 years ago

Thomas, just picked up a subtle but significant shortcoming in the handling of compilation of literals. When compiling, we need two different functions to store a literal into the new word. Both store the top of data stack at current location in a word being defined (i.e. compile only and immediate).

  1. Store as a number so later interpretation puts the number back on data stack
  2. Store as is, without any interpretation or meaning implied

In Ting's eForth, 1 is done by LITERAL and 2 is done by C, and , In STM8EF, you have used the CPU trap command ($83) to make an efficient number handler, so that a compiled number is simply $83 followed by a cell containing the number. Very short and simple. However, you have implemented C, and , to also store a number rather than just store the raw byte or cell at the current code point. As an example, let's say I wanted to have a word that just did an unconditional relative jump, using assembler code. So, what I want compiled is the hex equivalent of the assembler JRA tos-byte In Forth, this is: : JUMPREL ( u -- ) $20 C, C, ; The first C, stores the opcode for JRA and the second one stores the top of stack byte. [As an aside, JUMPREL appears to be rather useless, but it may be a legitimate piece of code. It is not very efficient, but it may just be what someone needs, say, as part of a Forth-based assembler!] Now, if C, and , were as described above, the word to compile a number in STM8EF is just : LITERAL ( n -- ) $83 C, , ; IMMEDIATE with the trap handler picking up the stored number and putting it on the data stack. Of course, an assembler version is far more efficient, but I love being able to do anything I can think of in Forth just to try it. When it works, then I consider making it more efficient. Over to you.

TG9541 commented 7 years ago

This would be a significant bug, but I'm not sure if I can reproduce it:

cold
stm8eForth v2.2
: LITERAL ( n -- ) $83 C, , ; IMMEDIATE reDef LITERAL ok
: test [ $55AA ] LITERAL ; ok
test hex . 55AA ok
' LITERAL $20 dump
  BC  83  0 83 CD 90 8B CD 90 7E 81  0 B4  4 74 65 73  ___M__M_~__4_tes
  CC  74 83 55 AA 81 49  1  4 64 75 6D 70 52 41 4C  0  t_U*_I__dumpRAL_
  DC   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  ________________ ok

Edit: I get the same result in NVM mode.

Maybe (not unlikely!) I missed the main point. Could you please provide an example that shows the problem?

RigTig commented 7 years ago

Simple example CREATE GN-BUF 0 C, should create the header followed by $00, but get

CREATE GN-BUF 0 C,
ok
' gn-buf 7 - 10 dump
A2   6 47 4E 2D 42 55 46 CD 84 40  0  0  2  4 64 75  _GN-BUFM_@____du ok

We should not be in compilation mode; we are still in normal mode.

The following does work as expected:

here 21 c,  c@ .
21 ok

I am just confused now.

TG9541 commented 7 years ago

I think that I understand now: in your example C, gets compiled after the doVar from CREATE GN-BUF. My example (LITERAL) uses a : definition and it works because it's IMMEDIATE.

On their own, C, and , aren't immediate but you can of course create a proxy, e.g. : [C,] C, ; IMMEDIATE.

In STC Forth both CREATE and VARIABLE must produce "something" in the "code field":

word mode NVM mode RAM
VARIABLE doVar @ doVar
CREATE doVar doVar

If a constant without a code field is required it can be produced like this:

FILE
: [C,] C, ; IMMEDIATE
: [,]  ,  ; IMMEDIATE

( by default `OVERT` isn't linked )
123 : test [C,] [ OVERT
HAND
' test C@ . 123 ok

By the way: when I started learning Forth about a year ago, I found the concept of "state" (interpret/compile) rather confusing. This issue improved my understanding of what's "core", and what isn't. I'm curious how much of the higher level words can be compiled to RAM :-)

RigTig commented 7 years ago

Aha! What I originally wrote is totally irrelevant to what is the issue. Basically, I tricked myself, and blamed the behaviour of c, and , . I have now moved forward to where I understand where I came unstuck. I really laughed when the real cause dawned on me.

What I was playing with was writing a Forth definition of DO. I must have been tired because what I initially wrote was a definition on the premise that the word would do the work directly. BUT, what is really needed is a word that creates code in words that does the work. Basically I could see what needed to be done, but I got the timing of when to do that work. To demonstrate my meaning here, consider the difference between the two following code fragments.

: xxx1  [ 0 ]  LITERAL ; IMMEDIATE
: xxx2 0  [compile] LITERAL ; IMMEDIATE

Both compile the literal $0: xxx1 when it occurs in a definition and the second one when the word being defined with xxx2 in it is run. Originally, I was using c, and , to store the the values and I got myself totally confused, especially when I tried doing the simple test using CREATE and I just tested the first byte. Note that DO has the xxx2 fragment as the first part of its Forth definition.

Please accept my apologies for distracting you from other tasks. And that is the first time I have seen a colon definition without a semicolon. Very nice solution (though quite non-standard)!

Re removing selected high level words from flash code, I have got a BAREBONES flash down to about 3900 bytes. After a bit of testing (just enough to identify really stupid coding errors), I'll update my BAREBONES branch again. I've still got more words to look at though, so you never know how low can we go.

TG9541 commented 7 years ago

Thanks for the explanation! Mistakes happen, and this one has been educational :-)

I'm curious about the direction in which this will lead: a very basic Forth that needs to be bootstrapped with source from the serial interface (instead of a disk with a bunch of "screens"). Maybe the EEPROM can be used to speed the process up?

RigTig commented 7 years ago

Just revisiting your method of creating a constant, I decided that moving the [ to just after the header would avoid the necessity to create immediate versions of any words needed in the body. So a constant could be created thus: 123 : test [ C, OVERT So now I have done the next step and moved the OVERT too. 123 : test [ OVERT C, And the following might make even more sense: : test [ OVERT 123 C, So, just to cap it all off, consider what is going on here:

: *
  [ OVERT
  $CD C, ' UM* ,        \ CALL   UMSTA
  $CC C, ' DROP ,       \ JP      DROP

Ooooohhhh!!! What fun can I have now?

TG9541 commented 7 years ago

What about defining some "immediate" words that make using your approach more convenient?

: :: [COMPILE] [ [COMPILE] OVERT ; IMMEDIATE ok
: C' [ $CD ] LITERAL [COMPILE] C, [COMPILE] ' [COMPILE] , ; IMMEDIATE ok
: J' [ $CC ] LITERAL [COMPILE] C, [COMPILE] ' [COMPILE] , ; IMMEDIATE ok
: * :: C' UM* J' DROP reDef * ok
8 3 * . 24 ok
' * 10 DUMP
  D4  CD 86 63 CC 83 86  4 44  4 44 55 4D 50  0  0  0  M_cL___D_DUMP___

Note that for using ::, C', and J' in NVM mode the "fool proof" dictionary search in compile state (never search RAM) must be relaxed.

Now we're almost in the mindset of metaprogramming. It's amazing how fast one gets there!

RigTig commented 7 years ago

One of the reasons I love using Forth is that you have full control over the total environment. The really funny part is that you can do it on an 8-bit CPU using just 5K bytes (or even less!!). I like your code above too. But I'm going to get cheeky and make just a minor modification: add the colon definition word into the double-colon word, thus: : :: : [COMPILE] [ [COMPILE] OVERT ; So, the usage just looks like: :: * C' UM* J' DROP And who said writing compilers is hard? This new compiler is ready and waiting for action!! The compiling words are not needed in a running application, so I expect them to be in ram anyway. Mmm...my technical writing skills are telling me I really ought to document all these words, but I have so many other projects to work on too. Not a high enough priority to do the doco now. As far as accessing the ram words, I want to change the compile-time action to ensure all compiled references in words coded in NVM refer to NVM location rather than RAM. I am not sure I am up to it (since I've only just started looking at STM8 assembler), but I don't think it's too hard. My current thinking assumes that, when in NVM mode, compiling a word defined in ram should use the address in the 2nd byte of the cfa rather than the cfa itself. In other words, the compiler assumes that the ram word is just a header. I cannot think of an easy way to ensure the coder doesn't violate that assumption and compile a normal action word defined in RAM (because there is nothing special about headers in ram, except their addresses all have high-byte as $00 as does any other word in ram). Thomas, there is no project depending on this, so it is low priority. Please could you code this sometime?

TG9541 commented 7 years ago

:: is now a nice defining word for assembly like compilation methods. However, we'd miss the opportunity to use CALLR, which the STM8EF presently does. MECRISP implements "tail call optimization" instead: ; turns the last CALL into a JP. However, we'd also have to turn the last CALLR into a JRA, which means that the addressing mode of the last CALL must be known.

I tried to extract a set of requirements for the separation of headers and code:

This should also work for headerless core words with a known address. However, I've got a feeling that there are edge cases (e.g. involving constants or variables).

@RigTig could you please check if I missed an obvious point?

RigTig commented 7 years ago

Mmm...I disagree that

we'd miss the opportunity to use CALLR

I assert that we are quite able to use CALL, as part of code after :: should we so choose. And it would be easy to create something similar for jumps too. For example, here is some code that is so new that I have not tested it. I have just found a circular dependency on D+ and UM+ in my Forth definitions; very funny!! Anyway, I decided to attempt a hand code for assembler for UM+. Here is my attempt (untested):

RAM
: 0 [ OVERT $CC C, $847A ,
: 1 [ OVERT $CC C, $847D ,
NVM
: UM+     ( u u -- udsum )
  \ Add two unsigned single
  \  and return a double sum.
  [ OVERT
  ' PLUS CALL,          \ CALLR   PLUS
  $2503 ,           \ JRC     JPONE
  $CC C, ' 0 1+ @ ,     \ JP      ZERO
  $CC C, ' 1 1+ @ , \ JPONE   JP      ONE

where the address derivations for 0 and 1 in UM+ are rather obvious and would be better if they were hidden. It might be nice to have a JP, for the two calls at the end, but then we would need to adjust the distance of the JRC JPONE relative jump depending on which way the JP ZERO actually assembled. Optimisation can be a slippery slope! Put it aside for now. Lets crawl before we walk!

Re first bullet above: : : RAM : $CC c, $6e @ , NVM ; \ effectively redefining : in terms of itself [Yep, it works ]

Re 2nd bullet above: If NAME> is the only way to get a code address given a name address, then I see you are correct and the solution becomes:

: NAME> COUNT $31 AND + \ as is now
  DUP C@ $CC = \ does it point to JP
  OVER $100 < \ is it in ram
  AND IF 1+ @ THEN ; \ get the NVM address

Now, testing this is beyond me. I think it needs to be coded into the forth.asm file.

The definition of a variable should be independent of the definition of colon words. However, I am still unhappy with how CREATE works and how there might be an interaction, but I think we're ok to go.

TG9541 commented 7 years ago

Hi RigTig,

on a 2nd thought: what's wrong with having the same functionality outside of the RAM area? Of course, there are edge cases like NVM :: test $CC00 ,

I gave it a try:

;       NAME>   ( na -- ca )
;       Return a code address given
;       a name address.

        .ifeq   BAREBONES
        .dw     LINK

        LINK =  .
        .db     5
        .ascii  "NAME>"
        .endif
NAMET:
        CALL    COUNT
        DoLitC  31
        CALL    ANDD
        CALL    PLUS
        LD      A,(Y)           ; DUP C@
        CP      A,#0xCC         ; $CC =
        JRNE    1$              ; IF
        INCW    Y               ; 1+ 
        LDW     Y,(Y)           ; @
        LDW     (X),Y
1$:     RET                     ; THEN

It seems to work as expected:

stm8eForth v2.2
: .. [ OVERT $CC C, ' . , ok
4 .. 4 ok
' .. 10 dump
8AF2  B6 65 A8  A 27  2 20 E4 CD 89 29 CD 8A 46 20 C1  6e(_'_ dM_)M_F A ok
' . 10 dump
8AF2  B6 65 A8  A 27  2 20 E4 CD 89 29 CD 8A 46 20 C1  6e(_'_ dM_)M_F A ok

It doesn't work in combination with NVM yet (right now, in NWM mode RAM words can't be used in COMPILE state)

RigTig commented 7 years ago

In fact, a side-effect of :: is to leave the system in NVM mode! By rights, :: should really restore the mode to its state before the word. Words are supposed to not have side-effects, which means they should leave everything the way they found them, except for its published function. It is expected to be only used in interpreter mode, so it'll work in both NVM and RAM modes anyway.

I cannot envision a circumstance where a Forth word with $CC at cfa can be anything else than an alias. The only use for an alias in NVM is to make a whole lot of long jumps into a relative jump to the alias (trading execution time for space). With the overhead of inline header, it is never going to be worthwhile. Now, if you make the NVM code headerless, there is no negative effect if the first byte in the body of the word is a $CC, since we are only looking at the header structure for the $CC. So, I think you are correct about simplifying the test for a header.

Gotta love the compactness of the assembler code. I couldn't do it, but you have nailed it nicely. I can follow it OK, but putting the STM8 assembler together from scratch is still beyond me. Thanks for the comments and I'll learn eventually. Nice to see it work.

I know you mentioned it before, but I had forgotten about compiling in NVM quietly ignoring anything in ram. Nice feature, that. Mmm...I cannot think of any easy way to just make an exception for the headers. The only spare bits in the header are the top bit of the ascii characters. What gets broken if we use the top bit of the first character of the name to designate the word as a header? If the broken bits are easy to fix, then we might have an easier to use filter for headers than the extra code we added to NAME>. Well, maybe not, since just testing for $CC is quick and easy, since already have the address. How hard is it to allow search of ram in NVM mode but only allow words with $CC at cfa?

We're still crawling, but I feel a walk coming on!

TG9541 commented 7 years ago

Some thoughts and observations:

We just extended the Forth language with a new class of words, the class ALIAS, akin to IMMEDIATE. However, the word class is not processed by the interpreter (using the compile only flag 0x40), or by the compiler (using the immediate flag 0x80), but by NAME>, the NFA to CFA translation word. On the other hand, the distinctive quality of the class ALIAS is the way it does the NFA to CFA translation, and dealing with that in NAME> is maybe the best way to do it.

ALIAS words currently are encoded by making the first byte in the Code Field equal to 0xCC, incidentally the STM8 opcode for JP. As a matter of fact, the assembly opcode JP is never executed, and we might as well add an alias flag 0x20 (limiting the length of the name of a Forth word to 31 characters), or treat upper case words as ALIAS words.

The existing STM8EF codebase has a much more complicated construct: the way CONTEXT and HERE depend on CP and 'EVAL. It gets even more complicated when you look at the code of CREATE! This complexity is the result of a specific requirement: make the interpreter use RAM, even if words are to be compiled to Flash memory.

In our context, it would make sense to generalize the requirement even more: it should be possible to split the dictionary from code. The ALIAS class offers a practical solution by introducing an aspect of DTC to STC. In fact we can now have a hybrid STC/DTC system!

I would like to push this idea a little further: if we now take a list of CFAs (e.g. from headerless Forth code), we could compile the numeric index to a Forth word to its CFA using the NAME> translation. From there, the step to a ITC Forth would just require adding an inner interpreter (which might be a good way for making the upper layers of eForth more dense)!

Another way this can be taken is working with malleable dictionaries. Why not split the core code from its dictionary part using the ALIAS/DTC method, e.g. at the floor and top parts of the Flash ROM, and introduce a DISCARD word to for marking words for removal, and a COLLECT word for removing unneeded word references? The fun part is that the same COLLECT word could also be used for moving temporary ALIAS/DTC in RAM to an persistent dictionary at the top of the Flash memory.

PS: using the ALIAS/DTC in RAM together with vanilla STC in NVM requires minor changes to CONTEXT and NAME?:

Near CNTXT::

1$:
        .endif
CNTXT_FIND:
        LD      A,#(RAMBASE+USRCONTEXT)
        JRA     ASTOR

At NAMEQ::

NAMEQ:
        CALL    CNTXT_FIND
        JRA     FIND

This doesn't solve the complicated problem of limiting compilation of CFAs in RAM into NVM code, but at least you can get into a walking mode :-)

TG9541 commented 7 years ago

Using the alias approach above @RigTig brought the core binary size for a W1209 down to less than 3.5KiB. Quite impressive!

TG9541 commented 7 years ago

Issues #26 and #28 contain important information in the context of the discussion in this issue:

RigTig commented 7 years ago

Still working on eForth code. Just got to resolve some tricky issues, not the least being when words depend upon each other. In assembler, references can be to anywhere as long as they resolve. In Forth, all words used must already exist (well, until you define forward references and resolvers). I am getting there but I am slow right now.

TG9541 commented 7 years ago

Forward references is tricky - that's the domain of code optimization which modern C compilers handle well. In the process of squeezing STM8EF I identified some of the reuse opportunities by searching for repeated sequences of code with an AWK script, and then arranged for many of them to be usable as a code fragment. Sorry for the twisted execution flow ;-)

RigTig commented 7 years ago

Hey, when you are chasing every optimisation you can find, then some 'programming purities' need to go by the wayside. Gotta love AWK. One of your tricks I liked was to define one CALL xxxx in the middle of several places needing it, and then use CALLR from each of those places to the CALL xxxx. Now, try adding that to an automatic optimiser! Some of your optimisations did limit what I could remove to get to a minimal BAREBONES core, but I was amazed at what I could remove. I have swapped over to using an STM8 development board, and MINDEV has just 2888 bytes in BAREBONES core with headers removed too. Mmm...I wonder if you can enlighten me about a comment in COMPILE code: it looks like a JP xxxx at end was changed back to CALL xxxx RET to fix a nasty bug.

TG9541 commented 7 years ago

The meanest bug west of Vladivostok, as far as I can tell. I don't really understand why the code worked at all (the program counter went backwards in some situations) which made implementing CREATE..DOES> a real challenge ;-)

MINDEV with 2888 bytes is very impressing! Maybe a future option is compiling eForth from the source as 8bit ITC. That would have the advantage that the high level code would be portable between different microcontroller architectures.

RigTig commented 7 years ago

COMPILE plays with the return stack; it needs to step over the following CALL (3 bytes) - since that CALL is really not to be executed but is a parameter for COMPILE. Note that COMPILE as written only works for 3-byte CALLs and not for 2-byte CALLRs. That is the bug I walked into, and then the program counter can go almost anywhere!! I've rewritten COMPILEto handle both CALLs and CALLRs, but it took me a while to get it just right - and I am not sure it is perfect even yet. Code is in my work-in-progress still. When I know it works, I reckon it is an excellent candidate for some assembler rather than the contortions needed in eForth to match the behaviour of the short relative addressing mode.

Oh, and the reason it works in STM8EF is that all the usages in the assembler code explicitly use 3-byte CALLs following each COMPILE. But CALL,is smart!!

I agree about looking at just a very basic core, and then compile eForth to get full system. I'll just put it on the backburner for now (which is to say, not now, but later).

TG9541 commented 7 years ago

I'm just wondering how COMPILE can know if it was called by CALL or CALLR ... I'll have a peek into your code later this day.

RigTig commented 7 years ago

Mmm...not how COMPILE was called but the word following COMPILE is actually compiled. Lets see if I can explain it better. Consider the following eForth code, which is the definition of FOR:

: FOR     ( -- a ) 
  \ Start a FOR-NEXT loop structure in a colon definition.
  COMPILE >R HERE ;
  IMMEDIATE COMPILE_ONLY

and the equivalent assembler code from forth.asm

;       FOR     ( -- a )
;       Start a FOR-NEXT loop
;       structure in a colon definition.
        .dw     LINK

        LINK =  .
        .db     (IMEDD+3)
        .ascii  "FOR"
FOR:
        CALLR   COMPI
        CALL    TOR
        JP      HERE

(Well, not 100% the same, because I've added COMPILE_ONLY to eForth source, but irrelevant to this explanation.) You can see the three words as they might be compiled by an optimising compiler: COMPILE is CALLR COMPI, >R is CALL TOR, and HERE is JP HERE. The usage of FOR is within a colon definition, such as : zz FOR R@ . NEXT ;, but note that FORis not compiled because it is marked as an IMMEDIATE word and it executes. At the eForth level, the description of what happens is that the first word, COMPILE, takes the (compiled version of the ) next word and compiles that word. If we look at the definition of COMPILE, it uses CALL, on the 2nd and 3rd bytes after COMPILE, and adjusts the program counter (being top of return stack) to be moved from just after the CALLR COMPIto just after the CALL TOR. Note that when FOR executes, the >R is not executed at all; it is just compiled into zz and will be executed when zz runs. So far, so good.

But what happens if what we compile for FOR is the equivalent of:

FOR:
        CALLR   COMPI
        CALLR    TOR  ; 2-bytes only
        JP     HERE

COMPILEassumes that the next word is always compiled as a CALL(so is 3 bytes long). It still grabs bytes 2 and 3 and makes that the compiled address, which is obviously incorrect. The program counter is also set 1 byte past end of the CALL TOR, and execution from that point could do anything.

Just as a matter of record, the best that the current eForth can achieve is to get both the first two optimisations of the CALLs to CALLRs (all done by CALL,), but cannot do the collapse of the CALL HERE RET to the JP HERE.

Footnote: I've said it many times, and long before I got to use GitHub, that the key to understanding Forth is to recognise when a particular word actually gets executed. Oh yes, I still get caught out (see the first paragraph of this issue!!!!!).

TG9541 commented 7 years ago

@RigTig, I wish I had had this explanation when I started learning Forth last year :-) Even later, I obviously missed the case where the word after compile gets compiled with a relative address. That's clearly a bug! I'll be looking into how to fix it.

RigTig commented 7 years ago

If it helps, here is my attempt at an eForth update to the COMPILE definition. I think it works, but testing is still underway.

: COMPILE ( -- ) .( expect ReDef )
  \ Compile next CALL (incl CALLR) in
  \  colon list to code dictionary.
  R> DUP C@ $CD - 0= IF \ CALL
    1+ DUP @ ELSE \ assume CALLR = $AD
    DUP 1+ C@ DUP $80 - 0< NOT IF $FF00 + THEN + 2+ THEN
  CALL, 2+ >R ;

I am sure that, in assembler, calculating the relative offset is much simpler (add, without using carry), so should be short and quick.

TG9541 commented 7 years ago

Looks good! It's indeed easier calculating the address in machine code, but due to certain STM8 instruction set restrictions it's tricky to achieve better code in other parts. Yesterday night my optimizer neurons didn't perform well. Working on it. By the way, I think it's safe to assume that CALLR offsets are always negative.

RigTig commented 7 years ago

Slowly, slowly catchee monkey. We don't need to rush on this count. As STM8EF is currently, you are correct that the CALLR offsets are always going to be negative. But, it is a relatively simple matter to add forward referencing to STM8EF ( just a few well-crafted words will do it ) and I'd be really annoyed at having to go back and fix COMPILE again! So, maybe better to do it right the first time, although I'd accept clear comments if the shortcut was taken.

You might notice that the eForth code sometimes has words that do not seem to show up in the assembler equivalent. For example, just after the R> in COMPILE, there is a DUP that has no equivalent in the assembler. However, the DUP in 1+ DUP @ is explicit. Now, I haven't figured it out for sure, but I think it is due to the fact that registers do not change when read but Forth words assume that parameters are destroyed and outputs are created. If the developer has a very clear understanding of what registers are still in place at end of a Forth word, and what registers are needed by the next Forth word, then optimisations can (have) been made. It is all too tricky for my non-juggling brain, but it has lead to some very interesting failures of the eForth code I put together since I mainly based eForth upon the forth.asm source. COMPILE took me a week to get it working at all.

TG9541 commented 7 years ago

@RigTig I assumed that you use the documented eForth source, not the aggressively size optimized STM8EF assembly code I made. You'll find all kind of tricks for squeezing out a couple of bytes. The register allocation is by design (I had to change the implementation of many words to achieve it).

;       DOXCODE   ( n -- n )   ( TOS STM8: -- Y,Z,N )
;       DOXCODE precedes assembly code for a primitive word
;       In the assembly code: X=(TOS), YTEMP=TOS. (TOS)=X after RET
;       Caution: no other Forth word may be called
DOXCODE:
        POPW    Y
        LDW     YTEMP,X
        LDW     X,(X)
        CALL    (Y)
        EXGW    X,Y
        LDW     X,YTEMP
        LDW     (X),Y
        RET

With DOXCODE "2+" takes 5 bytes:

CELLP:
        CALLR   DOXCODE
        INCW    X
        INCW    X
        RET

The original code requires 9 bytes:

CELLP
        LDW Y,X
        LDW Y,(Y)
        ADDW Y,#2
        LDW (X),Y
        RET

That's not much but the 13 times DOXCODE is used saves about 70 bytes, and juggling was fun ;-)

TG9541 commented 7 years ago

COMPILE in STC with CALL/CALLR optimization is a mess.

I am sure that, in assembler, calculating the relative offset is much simpler (add, without using carry), so should be short and quick.

Expansion from u8_t to i16_t in Forth is implicit - it takes more steps in assembly. Storing intermediate results without the data stack is also more difficult.

Here is a first version (CALL, and CALLR with negative/positive offset tested). The fix consumes 24 bytes.

;       COMPILE ( -- )
;       Compile next jsr in
;       colon list to code dictionary.

        .ifeq   BAREBONES
        .dw     LINK

        LINK =  .
        .db     (COMPO+7)
        .ascii  "COMPILE"
        .endif
COMPI:
        CALL    RFROM
        LD      A,(Y)
        CALL    ONEP
        CP      A,#CALL_OPC
        JRNE    COMPIO1
        CALL    DUPP            ; COMPILE CALL address
        CALL    CELLP
        CALL    TOR             ; this was a JP - a serious bug that took a while to find
        CALL    AT
        JRA     JSRC            ; compile subroutine
COMPIO1:
        EXGW    X,Y             ; COMPILE CALLR offset
        LD      A,(X)
        INCW    X
        PUSHW   X               ; return address
        CLRW    X               ; offset i8_t to i16_t 
        TNZ     A
        JRPL    1$
        DECW    X               
1$:     LD      XL,A
        ADDW    X,(1,SP)        ; add offset in X to address of next instruction
        EXGW    X,Y
        CALL    YSTOR
        JRA     JSRC            ; compile subroutine

Coding the CALL path in assembly saves 9 bytes:

COMPI:
        EXGW    X,Y 
        POPW    X
        LD      A,(X)
        INCW    X
        CP      A,#CALL_OPC
        JRNE    COMPIO1
        LDW     YTEMP,X         ; COMPILE CALL address
        INCW    X
        INCW    X
        PUSHW   X
        LDW     X,[YTEMP]
        JRA     COMPIO2
COMPIO1:
        LD      A,(X)           ; COMPILE CALLR offset 
        INCW    X
        PUSHW   X               ; return address
        CLRW    X               ; offset i8_t to i16_t 
        TNZ     A
        JRPL    1$
        DECW    X               
1$:     LD      XL,A
        ADDW    X,(1,SP)        ; add offset in X to address of next instruction
COMPIO2:
        EXGW    X,Y
        CALL    YSTOR
        JRA     JSRC            ; compile subroutine
TG9541 commented 7 years ago

The commit 065744e closed see issue #28. It took @RigTig some time to explain what's wrong (I never expected that "near" words in RAM could get on conflict with COMPILE).

The binary size increase was fully compensated by 0bc3b3c