davidgiven / cpm65

CP/M for the 6502
BSD 2-Clause "Simplified" License
264 stars 24 forks source link

Intel 8080 Emulator for the Atari 130XE and the BBC Master #134

Open ivop opened 4 months ago

ivop commented 4 months ago

Hi,

This is the Atari 130XE 8080 emulator overlay loader. I did not include the whole source tree of https://github.com/ivop/atari8080 but just the assembled overlay and a few 8080 sample files. If you want Zork dropped because of licensing issues, I can create a new pull request. Do you know any other 8080 only software that could be included? Most interesting stuff I found is Z80 only.

Here's a small video demonstrating how it works:

https://youtu.be/oHrtUj0fEk0

I briefly looked into porting it to the BBC. I think it could be fairly straight forward with the 16kB banking window at $8000-$bfff. Incrementing the 8080 program counter will be slightly slower though, because the bpl trick cannot be used. The Atari's window is at $4000-$7fff, so then the high byte overflows, the N flag is set. On the BBC you'll need to load it into A, compare with $c0 and use bcc instead of bpl. That'll especially affect short instructions, like MOV, but ALU stuff not so much. Also, the banking register and its values have to be changed and setting/resetting the background color. The rest is platform agnostic. Oh, and the handling of overflowing the bank during a sector read and write needs to be handled differently, as there is no RAM after the bank like on the Atari. And the overlay loader needs to be changed slightly, e.g. check for BBC extended memory instead of the 130XE scheme.

I'm not sure I know enough about the BBC to do this myself, so I think you are better equipped to do this.

Regards, Ivo

ivop commented 4 months ago

I looked a bit more on porting it to the BBC and noticed the following:

I also briefly looked into the Apple ][ and I don't think a port would be viable. It banks the full lower 48K, banking out the 8080 registers and the emulator, so memory access would be really slow by having a piece of code somewhere in the 4kB banks above $c000. There are Z80 cards for the Apple. Better use one of them.

I also got a request to port it to the Neo6502. I asked how banked memory was done or if it's even available, but got no response. I suppose the RP2040 firmware could be updated to support 64kB of banked memory, preferably at $4000-$7fff so the N flag trick keeps working.

Lastly, I thought of a way to improve emulation speed by eliminating the instruction size table and moving loading the operands and increasing the program counter to each instruction that needs it. This would make the main emulation loop much faster by eliminating at least three instructions for the single byte opcodes, and double or triple that for the two and three byte opcodes. For two bye opcodes, I can probably even remove the intermediate storage in byte2 and use (PCL),y directly, saving two more instructions. Three byte opcodes will have to keep the ZP storage because of dereferencing it as an (adjusted) pointer. All this will increase the emulator's code size by several kilobytes, but I think it'll fit. Adding 800+ Z80 instructions wouldn't have fit anyway with the current fast unrolled and duplicated code for similar instructions.

davidgiven commented 4 months ago

That's really impressive. The performance is better than I was expecting --- have you tried something like WordStar on it? I suspect it won't update fast enough to be useful, but maybe...

Re the BBC: yes, the high resolution modes are basically a non-starter on a stock B with CP/M-65. You really need a Master or, preferably, a Tube system. The latter uses a completely different CPU to run user programs, with its own 64kB of memory, and system calls are sent via a fast link via RPCs. However, the Tube doesn't have banking so it wouldn't work for you.

Regarding the banking addresses... the BIT instruction loads bit 6 of the target into V. You know bit 7 will always be set, so a BIT/BNE will detect a change from $bfff to $c000. It'll corrupt A, but ought to be cheaper than an LDA/CMP depending on circumstances.

Re the RP2040: the Morpheus firmware package (the one with NeoBasic in it) only has space for the 64kB of main RAM and 20kB of graphics RAM. Last time I looked there was about 16kB free. To get more you'd have to change the firmware to remove stuff like the graphics subsystem, but that's a step down the slippery slope that ends up with emulating the 8080 on the RP2040... you could do banking via external RAM, like SPI-attached PSRAM, but again that would need a firmware change.

ivop commented 4 months ago

Thanks! I must say I was surprised by its performance, too. I currently have a 'faster' branch in the atari8080 repo which is even 10-15% faster.

Re the BBC, yeah I guessed the Tube wasn't able to use the sideways RAM banks. So only the Master 512 would be a viable option with screen memory in shadow RAM, and the 8080's 64kB in 4 banks of sideways RAM. The BIT trick is indeed useful. AFAIR BIT does not clobber A, so this should work:

(in INCPC macro):

inc PCH inc PCHa ; adjusted high byte so (PCL) points inside extended memory bank bit PCHa bvc no_adjust ; recalculate PCHa from PCH and switch to next bank no_adjust:

That's only one instruction longer than the Atari version. And even if A was clobbered, that doesn't matter for the code following.

Re Neo6502, I'll tell the person that asked that it's not possible :smile:

I found a website with WordStar 1.0, 2.26, and 3.0 and onwards. It seems 3.0 introduced the Z80 dependency, so perhaps 2.26 will run. Hopefully it does vt52 and not ADM 3a. I'll look into it.

I have to rerun the testsuite after my ~13% speedup to see if nothing broke.

ivop commented 4 months ago

WordStar 2.26 runs! But... the install.com utility to change the terminal type does not support VT52. You can enter your own values, but you need to know the offsets into the binary. It literally asks for a hex offset and values. Those are in the WS manual, but I cannot find a WordStar 2 manual :disappointed:

LOCATION TO BE CHANGED (0=END): 1234
    ADDRESS : 1234H   OLD VALUE: 41H   NEW VALUE: 

I did a binary diff on two different WS.COM files. One for Zenith H89/H19 and one for Televideo 912 and it appears we can detect where the terminal codes are stored. Then match with the terminal manuals and see what they do and replace with VT52 equivalents. Hopefully that'll be enough?

ivop commented 4 months ago

I just ignored the Z80 label on https://winworldpc.com/product/wordstar/330 and installed WordStar 3.3. WINSTALL.COM supports entering the escape sequences for what it needs (cursor positioning and clear-to-eol) and with tty80drv and vt52drv loaded it runs! No Z80 code at all.

But it's too slow to do any real work with it. Loading a file works, cursor movement works, but it's not responsive enough. With a CPU accelerator like Rapidus with a 65c816 at 20MHz it runs flawlessly though! I don't have the real hardware though. Perhaps I'll get one for my birthday :wink:

ivop commented 4 months ago

I looked into a BBC port again, and I think I can make it work for every 128kB machine (B+128, Master128,Master512). At first I didn't get enough free TPA with the B+ and Master 128, but I found out I can force it to use shadow RAM by switching to mode 131 :smile: After that, it should work. Tested bank switching with $f4 and $fe30. Found out $f4 is important as the DFS ROM is needed, which I suppose is exactly the reason why I can't read directly into sideways RAM. So switching banks will be one instruction longer, but in the end I think it will run faster than the Atari. The Atari is effectively running at around 1.35MHz when RAM refresh and screen DMA is subtracted. The BBC will run at a full 2MHz, so that's 45-50% faster. Curious to see how that works out.

davidgiven commented 4 months ago

You might be able to get away with only updating 0xf4 before making system calls. It's used to remember which language is currently bank in, so that after making a system call it can return to the right place. I have faint memories that interrupts store the current bank using a different mechanism.

ivop commented 4 months ago

The BBC Master 128/512 port now works.

Video: https://youtu.be/5uONG8YH2L0

As expected, it runs quite a bit faster than the 130XE version. Still too slow for WordStar 3.3. Its terminal handling is terrible and I suspect it wasn't particularly fast when run on original hardware to begin with.

There's no B+128K support, as that turned out to be not so easy to do in one binary. It would either need a re-assembled binary specifically for the B+128K (bank 0,1,12,13 instead of 4,5,6,7) or live patching itself, which are quite a few location to be patched. For now it's Master 128/512 only, in mode 131 with video memory in shadow RAM.

Because of lack of disk space, I have it create a second bbcmicro.ssd image with just the screen apps (for vt52drv). Similar to the Atari 130XE I put the 8080 loader, overlay and its 8080 binaries in user area 2. Even though this was never discussed, I sort of treat user area 0 for all cross-platform stuff, user area 1 for platform specific files (e.g. setfnt for the Atari) and user area 2 for 8080 CP/M. This minimizes the risk of accidentally running a .COM file for the wrong CPU and avoids filename clashes (e.g. DUMP.COM).