SEE misses out showing every byte when the name cannot be found

TG9541 / stm8ef

STM8 eForth - a user friendly Forth for simple µCs with docs

https://github.com/TG9541/stm8ef/wiki

Other

315 stars 66 forks source link

SEE misses out showing every byte when the name cannot be found #26

Closed RigTig closed 7 years ago

RigTig commented 7 years ago

SEE misses out showing every byte when the name cannot be found and then misses out on some names too.

For example:

see see ' CR 1- 2+ DUP @ DUP 83 94 CD 51 83 94 CD 8F .ID 1+ 5 93 U. NUF? 83 94 CC 72 7D 57 ...

A better SEE is:

see3 see ' CR 1- 2+ DUP @ DUP CD 83 4E 94 9E >NAME CD 83 49 94 AE SPACE .ID 1+ 20 5 AD 93 U. NUF? CD 83 4E 94 8A DROP 94 7D ...

What you get to see depends upon what word headers exist. The option flags in globconf.inc control the inclusion (or exclusion) of both complete words or just their headers. In this example, 'CD 83 4E' would have shown as '?branch' if the WORDS_LINKRUNTI option had been set to 1 when the STM8 was flashed.

The Forth code for SEE3 is : SEE3 ( -- ; ) \ A simple decompiler. \ Updated for byte machines. ' CR 1- BEGIN 2+ DUP @ DUP IF

NAME THEN ?DUP IF SPACE .ID 1+ ELSE 1- DUP C@ U. THEN NUF? UNTIL DROP ;

2fix: In forth.asm, replace SEE3: CALLR DUPPCAT with SEE3: CALL ONEM CALLR DUPPCAT

Note: I am working backwards through forth.asm and adding more to the list of words removed with the BAREBONES option in globconf.inc. I am choosing words that can be added back in NVM mode or just put in ram for a temporary use because they can be written in Forth code. I may find other issues as I pick my way through the word list.

TG9541 commented 7 years ago

Hi RigTig, when I started working on Dr. C.H. Ting's STM8EF SEE was a bit more broken for STC than now. At that point, some of the execution threads were already tangled (e.g. a word anded with a jump to DROP. In my pursuit of denser code, I did more things that broke SEE than I can remember:

use more opportunities to merge threads with tail jumps
use relative jumps instead CALL BRAN
use assembly code when it's more efficient than STC Forth
factor out assembly segments that are not Forth threads
use more tricks (e.g. optionally use registers for return values, swap X/Y)
...

I can see several things here:

SEE is an important tool for reverse engineering
there are many opportunities for making code more dense I may have missed by optimizing ROM code (e.g. structure words like IF, THEN, or WHILE ...)
you have better ideas than I have, and your Forth skills are much better than mine :-)

I'd like to propose the following:

please fork the repository (this makes it easier to integrate and track changes with pull requests)
the SEE issue gets handled in this GitHub issue #26)
we look for more opportunities to save ROM, e.g:
- include options for moving interpreted core words to RAM (i.e. define them before compiling the rest)
- make the EEPROM usable for compiled code (it would also be an option to move a part of the interpreter there)

Sounds like fun :-)

RigTig commented 7 years ago

I am learning about github here, but fork is done. We'll continue our quest to further shrink STM8EF and make it even more useful in that fork. On SEE, it is just a utility for learning and development, especially when addresses move on changing compilation and flashing options. A disassembler is also a handy tool. I might just have a crack at it, and learn lots about STM8 in the process. As SEE stands with this proposed change, it does not show whether the command for a Forth word is a jump always or a jump subroutine, but it still helps see what it actually has in it and whether the header exists. Definitely fun and a worthwhile challenge to boot :-)

TG9541 commented 7 years ago

Great, the first step worked! It looks to me like your STM8SEF master branch was a bit out of date, and a lot of changes were missing. I put some comments here. The first and the last comment are most important. I wrote the comments in the middle before I understood what had happened:

https://github.com/RigTig/stm8ef/commit/556c9c7a67a60c508c05af90b4b1491d0e9e580f#diff-4f1b15a588a349787acbd2e56d41d6d1

Normally merging changes between two different "baselines" in git works fine, but one has to merge changes from "upstream" first (i.e. first update your forth.asm in master, then "pull" your changes to the main repository. It's also good practice, to do all developments in a development or in a feature branch (e.g. develop or barebones, not in master). This makes merging changes much easier!

I'm also in learning mode with the GitHub workflow. Here is a generic intro: https://guides.github.com/introduction/flow/

TG9541 commented 7 years ago

Good writeup by @RigTig in this commit comment. Many good ideas there (with some more ideas from my side):

define a real core vocabulary that must be linked for bootstrapping an interactive Forth
keep headers in a separate memory area, so that the are only in memory while they are needed a. RAM: fully volatile b. EEPROM: extend Flash space, non volatile c. Flash: remove the scaffolding
Move to ITC (Indirect Threaded Forth) a. mostly re-write ("soft" inner interpreter imposes limits to a "interrupts in Forth" feature) b. use a TRAP to mix-and-merge with STC
Take advantage of the SWIM interface in Forth programming (ICP instead of the serial interface) a. generate a list of entry points for headerless core words, and combine with 2) or 3)

A (non functional) demonstration of 2a is : : RAM : $CC c, $6e @ , NVM ; (what's needed is an additional level of redirection in NAME>).

One bug to be fixed:

make sure ABORT" and abort" can be told apart in case-insensitive mode

RigTig commented 7 years ago

Now you are getting ahead of me!! YooHoo!! Keep it going. I'll catch up soon. But, I can contribute something useful, relevant to 4a. I just needed to use some headerless code again, and had to go through the process of figuring out the address again. Mmm...there is always a better way. I couldn't figure out a strategy based upon the compiled image, but I can get all I need from the listing of the relocated code (forth.rst). So I wrote a utility in python to scan the forth.rst and create a list of all the headerless code available in that flash. Of course, the most useful list is one ready to be loaded into Forth, though just selecting the ones you need is far less wasteful of our valuable ram. So the list looks like:

\ Scanning out/W1209/forth.rst
: ?RXP [ OVERT $CC C, $822B ,
: TXP! [ OVERT $CC C, $8232 ,
: branch [ OVERT $CC C, $8336 ,
: EXIT [ OVERT $CC C, $834C ,
: doVar [ OVERT $CC C, $83C6 ,
...
: $COMPILE [ OVERT $CC C, $8DB9 ,
: OVERT [ OVERT $CC C, $8DDD ,
: ULOCKF [ OVERT $CC C, $8F73 ,
: LOCKF [ OVERT $CC C, $8F7E ,

So you just pick out the ones you want and copy into your favourite serial terminal. Well, not quite. In this set, you'll notice that OVERT is one of the hidden pieces of code and you cannot use a word until it is defined. So, need to just waste a byte to define OVERT as follows (obviously before you use anything else): : OVERT [ $CC C, $8DDD , ] ;

I'll put the python code into my barebones branch so you can see it. I actually have it installed in the folder above all of my branches. Usage is simple enough. Assuming working in barebones folder: ../getHeaders.py out/W1209/forth.rst >headers.f I suppose the headers.f really belongs in the out/W1209/ folder since it will be different for each build.

TG9541 commented 7 years ago

Hehe the discussion is now unfolding in two "issue" threads. I wrote a similar script in AWK (not the most popular scripting language these days but still incredibly useful :-)

The ultimate goal is splitting the headers from the code. But how do you feel about writing an address list to the upper 512 bytes of the 128 bytes EEPROM ;-) ? Done right the index would only depend on the order of the words in forth.rst. Pointers to words excluded in a configuration could refer to an abort word. Does this sound practical?

RigTig commented 7 years ago

I haven't used awk since I first started with SGML (and then XML). It is very useful, but I've forgotten most of it. Can you please put a copy of your awk script in my BAREBONES branch (or somewhere I can see it)? EEPROM is too small for header list: 128 bytes is only 64 eForth word addresses. Maybe could use it as an experiment, but I reckon on using the top of flash memory. Maybe top-down? Not sure about any dependency upon order in source at all: after all, it is just a look up table. If code is moved, just rewrite the new address. If code is removed, then put in address for abort, or re-use (providing ensure no other word refers to it). We need another branch of STM8EF to explore this. Do you want to do it or will I? I reckon we'll need to re-capture lots of our thoughts from these issues into that branch too.

TG9541 commented 7 years ago

@RigTig I worked a bit on the subject above.

The attached (g)AWK file produces output like this:

: OVERT [ $CC C, $8B8C , ] ;
: \ [ OVERT $CC C, $884F ,
: abort" [ OVERT $CC C, $89CB ,
: HERE [ OVERT $CC C, $85AE ,
: HAND [ OVERT $CC C, $8457 ,
: $,n [ OVERT $CC C, $8B33 ,
: AND [ OVERT $CC C, $83DA ,
: SAVEC [ OVERT $CC C, $8DD9 ,
: IRET [ OVERT $CC C, $8DE0 ,
: NEGATE [ OVERT $CC C, $8565 ,
: HOLD [ OVERT $CC C, $8655 ,
: ."| [ OVERT $CC C, $8782 ,
: ULOCKF [ OVERT $CC C, $8D64 ,
...

genalias.zip

BTW: encoding the ITC index table in about 250 bytes is be possible if it's expanded to RAM (based on the assumption that no core routine is longer than 255 bytes).

TG9541 commented 7 years ago

The alias feature has just been added. I'd propose discussion to be continued in #27. ITC is a story in its own right.