Possible optimisation for SB/LB/LAB/SAB/DAB

ZornsLemma commented 7 years ago

It occurred to me that there's potential for a small optimisation on the byte read/write opcode implementations. If we take SB as an example, it's currently:

SB      LDA     ESTKL,X
        STA     TMPL
        LDA     ESTKH,X
        STA     TMPH
        LDA     ESTKL+1,X
        STY     IPY
        LDY     #$00
        STA     (TMP),Y
        LDY     IPY
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

If we're willing to use self-modifying code, we can write that as:

SB      LDA     ESTKL,X
        STA     SBSTA+1
        LDA     ESTKH,X
        STA     SBSTA+2
        LDA     ESTKL+1,X
SBSTA   STA     $0000
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

which by my count saves 3 bytes and 8 cycles.

If we're not willing to use self-modifying code, but we are willing to set aside two consecutive zp locations ZEROL and ZEROH and initialise ZEROL permanently to 0 on VM startup, we can write that as:

SB      STY     IPY
        LDY     ESTKL,X
        LDA     ESTKH,X
        STA     ZEROH
        LDA     ESTKL+1,X
        STA     (ZEROL),Y
        LDY     IPY
        INX
;       INX
;       JMP     NEXTOP
        JMP     DROP

which by my count saves 4 bytes and 5 cycles compared to the original code (although we'd lose 4 bytes to the one-off initialisation of ZEROL in VMINIT, but we'd still come out ahead applying this optimisation across multiple byte read/write opcodes).

(Disclaimer - I've given the self-modifying version a quick test and it seems to be fine - ROGUE runs :-) - but I haven't tested the other one at all.)

PS The non-self-modifying approach using ZEROL/H might also allow optimisation of the word-oriented versions of these opcodes.

dschmenk commented 7 years ago

Very interesting. I’ll check the Apple II for write-protected memory banks.

On Aug 9, 2017, at 2:59 PM, ZornsLemma notifications@github.com wrote:

It occurred to me that there's potential for a small optimisation on the byte read/write opcode implementations. If we take SB as an example, it's currently:

SB LDA ESTKL,X STA TMPL LDA ESTKH,X STA TMPH LDA ESTKL+1,X STY IPY LDY #$00 STA (TMP),Y LDY IPY INX ; INX ; JMP NEXTOP JMP DROP If we're willing to use self-modifying code, we can write that as:

SB LDA ESTKL,X STA SBSTA+1 LDA ESTKH,X STA SBSTA+2 LDA ESTKL+1,X SBSTA STA $0000 INX ; INX ; JMP NEXTOP JMP DROP which by my count saves 3 bytes and 8 cycles.

If we're not willing to use self-modifying code, but we are willing to set aside two consecutive zp locations ZEROL and ZEROH and initialise ZEROL permanently to 0 on VM startup, we can write that as:

SB STY IPY LDY ESTKL,X LDA ESTKH,X STA ZEROH LDA ESTKL+1,X STA (ZEROL),Y LDY IPY INX ; INX ; JMP NEXTOP JMP DROP which by my count saves 4 bytes and 5 cycles compared to the original code (although we'd lose 4 bytes to the one-off initialisation of ZEROL in VMINIT, but we'd still come out ahead applying this optimisation across multiple byte read/write opcodes).

(Disclaimer - I've given the self-modifying version a quick test and it seems to be fine - ROGUE runs :-) - but I haven't tested the other one at all.)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dschmenk/PLASMA/issues/21, or mute the thread https://github.com/notifications/unsubscribe-auth/AELjJpKTqlfYFuWlvEwSVrD4WiAvXNv-ks5sWiu6gaJpZM4Oysgo.

peterferrie commented 7 years ago

Very interesting. I’ll check the Apple II for write-protected memory banks.

We can't guarantee to be running from writable memory, which is why I haven't suggested self-modification so far. It is certainly legal to write-protect the LC, for example. If we can spare two zero-page locations, then that's a good saving, but it might have to be an option in that case. There will be environments which don't have them available.

dschmenk commented 7 years ago

We only need one ZP location if we overload TMPL and stick a zero before it.

On Aug 9, 2017, at 9:10 PM, Peter Ferrie notifications@github.com wrote:

Very interesting. I’ll check the Apple II for write-protected memory banks.

We can't guarantee to be running from writable memory, which is why I haven't suggested self-modification so far. It is certainly legal to write-protect the LC, for example. If we can spare two zero-page locations, then that's a good saving, but it might have to be an option in that case. There will be environments which don't have them available. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dschmenk/PLASMA/issues/21#issuecomment-321445321, or mute the thread https://github.com/notifications/unsubscribe-auth/AELjJue2-OfqM7Lu63WN9l5N4a6KqKvBks5sWoKYgaJpZM4Oysgo.

dschmenk commented 7 years ago

So I took a stab at implementing the self-modifying code. In order to keep the Apple II's language card bank write-enabled required enough extra code that it may be a wash. So it is off by default for the Apple II. For the Apple I and Apple III, there is no such requirement, so it is enabled by default. Steve's other idea for using a zero in the ZP didn't offer up as much opportunity as I thought it might, so I didn't implement it. But there is a spot in the ZP variables to squeeze a zero in front of TMPL if we want to re-visit this.

dschmenk / PLASMA

Possible optimisation for SB/LB/LAB/SAB/DAB #21