SamCoVT / TaliForth2

A Subroutine Threaded Code (STC) ANSI-like Forth for the 65c02
Other
29 stars 5 forks source link

Strange RTS behaviour #5

Closed SamCoVT closed 1 year ago

SamCoVT commented 1 year ago

Copied from scotws/TaliForth2 - original issue by @svhum:

First, I want to say that this is a really great project!

I am encountering a strange issue using the assembler on actual 65C02 hardware. I'm using a simple SBC I built that has the memory map recommended in the TaliForth documentation, with the ROM at $8000 and kernel at $E000. I use a modified wozmon as the monitor (that uses a UART for I/O instead of Apple hardware) in the remaining kernel memory space, and a UART at $F000. So far, TaliForth seems to run as expected except when using the assembler.

The issue I'm having is using the RTS word in the assembler. In py65mon, the following works as expected:

here .s <1> 2048  ok
rts .s <1> 2048  ok
execute .s <0>  ok

However, for some reason on the SBC, RTS consumes what is on the stack:

here .s <1> 2048  ok
rts .s <0>  ok

This will obviously prevent a subsequent execute word from running.

I am wondering if this has something to do with how TaliForth is run from the monitor somehow (i.e. E000 R), but can't pinpoint why this would be.

SamCoVT commented 1 year ago

Hi @svhum, That is indeed an interesting problem. Assuming E000 is the address of your kernel_init function, starting Tali from the monitor in the method shown should work fine and should not be related to this issue.

Here are a few questions to get started with: What CPU are you using (eg. is it a Rockwell or WDC 65C02?) in your SBC? Which assembler are you using (Ophis or 64tass)? Can you attach your platform file (and any non-Tali files included) so I can assemble the same version you are using on your actual hardware?

I've tried your example code in both py65mon and the kowalski simulator (they have different simulation cores) and both work perfectly fine in simulation. I'll try on real hardware (probably this evening) and see if I can recreate your issue in hardware.

For reference, some assembly instructions do take an argument from the stack, but RTS is not one of them. There is a table (found in disassembler.asm - it's used by both the assembler and disassembler) called oc_table that has the number of bytes for the instruction in the upper two bits of the first byte. RTS correctly shows 1 byte as the length. I've reviewed the asm_common code that looks in that table to determine if another byte should be compiled or not, and I don't see anything super-obvious that could cause your bug. With that said, I also don't normally use RTS in my assembly because I only use the assembler within Forth words and Forth puts an RTS on the very end automatically.

Some tests for your hardware: To determine if the missing stack item was actually compiled after the RTS instruction, can you:

assembler-wordlist >order
here .s
dup rts  \ Give rts an extra copy of here to consume : 
here .s  \ Should show old and new location of here. : 
drop     \ remove new here location leaving old value of here : 
@ hex u. \ Expecting 60 (LSB) and whatever was in the next byte (MSB) : 
svhum commented 1 year ago

Hi @SamCoVT,

Thanks for your reply. Here is the output of your test:

assembler-wordlist >order  ok
here .s <1> 2048  ok
dup rts  ok
here .s <2> 2048 2051  ok
drop  ok
@ hex . 60  ok

I did try pulling source from the SamCoVT repository, master-ophis branch to ensure the files are the latest version, and re-compiling - no change as expected, as the files seem identical to those in the scotws repository. I did notice some differences with the master-64tass branch in some of the source files, but did not investigate further.

The SBC is actually an FPGA using the R65Cx2 core, and not actual 65C02 silicon. I am not aware compatibility issues with this core, but can't rule out it might be a problem. In any case, I will send you the platform file. Thank you!

svhum commented 1 year ago

Hi @@.***>,

Thanks for your help. Attached is the platform file.

From: SamCoVT @.> Sent: November 22, 2022 11:04 AM To: SamCoVT/TaliForth2 @.> Cc: Sean Hum @.>; Mention @.> Subject: Re: [SamCoVT/TaliForth2] Strange RTS behaviour (Issue #5)

Hi @svhumhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsvhum&data=05%7C01%7Csean.hum%40utoronto.ca%7Cebeffb5e16d54b297a3708dacca31cea%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638047298175231921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2rNuUe0ep9vV9kGYGE4J6HD%2BJi9qR3vw0eqjf5eJces%3D&reserved=0, That is indeed an interesting problem. Assuming E000 is the address of your kernel_init function, starting Tali from the monitor in the method shown should work fine and should not be related to this issue.

Here are a few questions to get started with: What CPU are you using (eg. is it a Rockwell or WDC 65C02?) in your SBC? Which assembler are you using (Ophis or 64tass)? Can you attach your platform file (and any non-Tali files included) so I can assemble the same version you are using on your actual hardware?

I've tried your example code in both py65mon and the kowalski simulator (they have different simulation cores) and both work perfectly fine in simulation. I'll try on real hardware (probably this evening) and see if I can recreate your issue in hardware.

For reference, some assembly instructions do take an argument from the stack, but RTS is not one of them. There is a table (found in disassembler.asm - it's used by both the assembler and disassembler) called oc_table that has the number of bytes for the instruction in the upper two bits of the first byte. RTS correctly shows 1 byte as the length. I've reviewed the asm_common code that looks in that table to determine if another byte should be compiled or not, and I don't see anything super-obvious that could cause your bug. With that said, I also don't normally use RTS in my assembly because I only use the assembler within Forth words and Forth puts an RTS on the very end automatically.

Some tests for your hardware: To determine if the missing stack item was actually compiled after the RTS instruction, can you:

assembler-wordlist >order

here .s

dup rts \ Give rts an extra copy of here to consume :

here .s \ Should show old and new location of here. :

drop \ remove new here location leaving old value of here :

@ hex . \ Expecting 60 (LSB) and whatever was in the next byte (MSB) :

- Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSamCoVT%2FTaliForth2%2Fissues%2F5%23issuecomment-1323904705&data=05%7C01%7Csean.hum%40utoronto.ca%7Cebeffb5e16d54b297a3708dacca31cea%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638047298175231921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FbpqcSkSlRPSf66tH%2BEt8RJQC9aFb3pfnkOc7TJPUfM%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAONYMGJS7EGGKVGOVAMAEJLWJTVFPANCNFSM6AAAAAASH5NPGQ&data=05%7C01%7Csean.hum%40utoronto.ca%7Cebeffb5e16d54b297a3708dacca31cea%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638047298175231921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0zlS5Je61GwEmyFwS0OihIxZnXDnVKyCjfG1ccR%2FkeE%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.**@.>>

SamCoVT commented 1 year ago

The attachment did not come through - if you attempted to send it via email, you may want to try submitting it directly on GitHub here: https://github.com/SamCoVT/TaliForth2/issues/5 You can drag and drop files to the comment box. I think I'd also like to see docs/platformname-labelmap.txt where platformname is the name of your platform. This has all of the addresses used in Tali2 in it and we can see exactly where the assembler tables are and make sure the correct values are there.

Looking at the results you posted, it looks like Tali thinks rts is a 3-byte instruction. This means it saw "11" (binary) in bits 7 and 6 of the assembler table instead of "01" that should have been there. "00" in those bits could also cause this behavior - it's an invalid value for the assembler because there are no 0-byte instructions, but it would cause 3 bytes to be assembled.

The fact you got just 60 (hex) is not surprising because 2048 (800 hex) was on the stack and you likely got "60 00 08" compiled. The @ only pulls two bytes so we got 60 in the LSB and 00 in the MSB == 0060 but the leading zeroes are not printed. The 08 is likely there but we didn't pull it from memory to display. If you wanted to verify that, you could do:

decimal 
2048 3 dump  \ dump 3 bytes starting at address 2048

It's good to know you are using an FPGA core. I'll test on a real WDC 65C02 just to make sure. The assembly code does do double-indirection (the main assembler table is a table of addresses to the info about each instruction, like how long it is) for assembling, so it's possible there's something funny there that wouldn't be noticed by most code. We should be able to get to the bottom of it one way or another.

Do you have any ability to set a breakpoint on your hardware, where we could stop and inspect memory or registers?

svhum commented 1 year ago

The result of the dump is:

2048 3 dump 
0800  60 00 08  `.. ok

I currently can't set breakpoints but I could hack something in to do that if need be. In the meantime, here are the files. Thanks! platform.zip

SamCoVT commented 1 year ago

I've tested this on real hardware, and the real hardware responds correctly. Also, Tali has a test suite that tests assembler opcodes for the correct opcode compiled AND the correct length compiled. This test suite works on real hardware and py65mon. It may be worth running the test suite on your hardware in the future, but first we'll try to narrow down the issue.

That attachment came through fine. The dump you did shows that indeed, the two bytes compiled after (using $ to denote hex values) the $60 were the $0800 (little end first) that was on the stack (2048 decimal). That's what I expected, but is not correct behavior, so we will try to find out why. Lets move on to the opcode tables used by the assembler.

Looking in your labelmap file, I find oc_index_table and oc_table:

$ACF7 | oc_index_table                  | disassembler.asm:205
$AEF7 | oc00                            | disassembler.asm:293
$AEF7 | oc_table                        | disassembler.asm:276

oc is short for opcode. The oc_index_table is an array (indexed with the opcode byte) of pointers into the oc_table. Each entry in the oc_table has a byte with the number of bytes to compile in bits 7:6 and the length of the name of the opcode in the lower bits. Then there is the string holding the name of the opcode immediately after this byte.

$ACF7 holds the address of the data for opcode $00, $ACF9 holds the address of the data for opcode $01, etc. Because these are 2-byte addresses, we can take the opcode, multiply by 2, add to the starting address, and get the address of the data for that opcode. The address we get SHOULD be somewhere a bit higher in memory than $AEF7, which is where the opcode data starts.

On the py65mon version from master-ophis (we'll stick with the ophis assembler for now because you started with it), which you can run with "make sim" if that's convenient, it looks like the same addresses are used for the opcode tables so you should get these same values:

hex
ACF7      \ base address of opcode pointer table
60 2* +   \ indexed using opcode $60 which is RTS
@ dup u.   \ Get the address and print it
c@ u.      \ Get the first byte of data and print it.

I'm using u. (unsigned print) because hex values over 7FFF print as negative numbers with . Here is what I get:

hex  ok
ACF7  ok
60 2* +  ok
@ dup u. B0A9  ok
c@ u. 43  ok

$B0A9 is address of the data for RTS. $43 has the bit pattern "01" in bits 7 and 6, so the instruction is 1 byte long. $43 has a 3 in the lower bits, so the name is 3 bytes long. Dumping a few bytes starting at $B0A9 shows the $43 is followed by "rts":

B0A9 8 dump 
B0A9  43 72 74 73 87 61 64 63   Crts.adc  ok

The $87 is the next starting data byte for the adc.zxi instruction. The $8 has the bit pattern "10" in bits 7 and 6, so this instruction should compile 2 bytes, and the $7 is the length of the name that comes next (truncated in this dump).

You should calculate the same $B0A9 address and you should have the same data there. Let me know what you find. Because the tables are in the same locations, you should be able to try on both py65mon and on your hardware and get the same results.

svhum commented 1 year ago

Thank you very much for your detailed answer. You will probably be annoyed with me, but at the same time amused (I hope) with my answer... your excellent diagnostic suggestion enabled to find the error. Indeed $43 was not at address $B0A9 as expected. I was getting $0D, which I just happen to recognize as the normal value for the status register in my UART implementation. I checked and there was an error in my chip select logic that just happened to accidentally map the range B000-B0FF to the UART (in addition to the correct F000-F0FF range), a range that just happens very coincide with a number of the opcodes in the assembler table including rts. What a subtle error. If it had been any other 256-byte address range blanked out with my erroneous address decoding logic, probably I would have had worse problems!

I will definitely check out the test suite to make sure everything else is set up straight in my 65C02 system implementation.

Anyway, everything works fine now with rts. Sorry for leading you on this wild goose chase. You were an immense help and it made me appreciate how the assembler is implemented in this great interpreter. You are doing fantastic work, keep it up!

SamCoVT commented 1 year ago

Hi @svhum, Not really annoyed - I was reasonably sure it was a hardware issue. Fortunately, Forth is an excellent environment to poke and prod the hardware and test things out (assuming your environment is stable enough for Forth to run). The $0D value explains the 3 bytes being compiled ("00" in bits 7:6 was one of the possibilities), and I'm glad you were able to trace it to your decoding logic because that means there is an easy solution.

If you are interested in running the tests, they are in the tests folder. The tests are broken into categories so, for example, the assembler tests are in asm.fs. In order to run the tests, you will need to load tester.fs first. This creates the words T{, ->, and }T. The first word doesn't do anything. The middle word saves and removes any items on the stack, and the final word compares what is currently on the stack to the items saved by ->. If there is a mismatch, it prints an error (either INCORRECT RESULT or WRONG NUMBER OF RESULTS so you can search for those errors later). The expected use is:

T{ words to test -> expected result }T

All of the other test files expect the tester to be loaded first. Then you can load the other test files you want to run in any order you want. Most use markers to restore any dictionary memory used, so when the test file is complete, only the tester words will be left in the dictionary.

On my hardware, I have hardware handshaking enabled on the serial port so that I can copy/paste directly into my terminal and TaliForth can take the input as it gets to it without losing anything. If you don't have this on your hardware, you will likely need to add a line delay between sending lines. I send the tester.fs file over via copy/paste. Then I turn on logging in my terminal to save everything to the disk. Then I paste all the tests I want to run. Then I stop the logging and I grep through the results looking for "RESULT" which is only contained in both of the possible error messages.

You should be able to run all of the tests on real hardware except cycles.fs as that one uses simulator support to cycle count how many CPU cycles a word takes to run (using a virtual 32-bit timer). Technically, if you really wanted to, you could create that virtual timer in hardware and run that test too. It's probably not that important and I use it just to check for words that suddenly take way longer or shorter to run than they used to.

If you can run the full test suite, that should give you good confidence in your hardware and software setup. It will test at least a quarter of your RAM space and almost every bit of Tali Forth that is in RAM or ROM. If you need any help getting the test suite working, feel free to open another ticket.