CPU: Implement more Q opcodes

lydon42 commented 1 year ago

Implement some more Q opcodes:

[x] NEGQ Q - negates Q register - 42 42 42 NEG NEG NEG

Wishlist:

[ ] ZERQ Q - sets Q register to 0 - 42 42 4B NEG NEG TAZ
- or ZERQ with addressing instead, to be able to zero memory locations? (dan)
[ ] PHQ / PLQ (#536 BitShifter) ???
- Paul commented to this that using PHx/PLx four times is only 4 bytes and takes only 8 cycles. Implementing it as a Q opcode would therefor only save 1 byte and possibly 2 cycles, but will add a whole lot of complexity, as the stack handling code is complex.
[ ] DEQ / INQ (dan)
[ ] LDQ 8bit immediate with sign extension (#536 LGB)
- with this LDQ #0 would be the same as ZERQ immediate
[ ] SBCQ / ADCQ 8bit immediate (LGB)
[ ] Mapping relatet opcodes
- [ ] TMQ / TMMQ - transfer map register and map megabyte to Q (mentioned by paul on discord, writing is done by MAP/EOM)
- [ ] PHM / PLM - push/pop map state to/from stack (dan)
- 819
[ ] let all invalid extended opcodes ~~BRK~~ throw a Reserved Instruction HyperTRAP (paul) instead? (dan)

lydon42 commented 1 year ago

Thoughts: 42 42 42 for NEGQ could be problematic, perhaps we need a different opcode here. ZERQ Q should just 0 all four registers. So no option to load an immediate value to Q. But that would also mean that this should be some 8bit opcode that takes no parameters.

lydon42 commented 1 year ago

The T.. opcodes are pretty useless for Q ops, so they could be a perfect candidate for these special needs functions.

lydon42 commented 1 year ago

Paul did something that implements NEG NEG NEG as NEGQ 6725b354e70a7ee519e1ebc06d788480443ff2f2 2f353056c7b74a3e356369d46affce9b917dc96a

dansanderson commented 2 weeks ago

I'd love a DEQ. Symmetrical with INQ, including the non-implied addressing modes. Currently only possible with SBCQ (or a series of SBC). I use 32-bit countdowns sometimes.

I wonder if reserved Q opcodes should BRK, i.e. NEG NEG followed by anything that isn't already a documented instruction. This would discourage accidental dependencies on undocumented behavior.

If NEGQ is complete and in a stable release, we should document it!

lgblgblgb commented 2 weeks ago

Also, interesting to think about to have SBCQ #$nn and things like that, having only a single byte immediate value (since IIRC we talked about already that we don't like to have a 32 bit immediate value too much in opcodes) which is meant to be zero (or sign?) extended to 32 bit to do the math with the Q register. Not sure how useful it is ... ZERQ is very handy and can be a short form for the task to set A/X/Y/Z to all zero, even if no further Q opcodes are used at all.

dansanderson commented 2 weeks ago

Some thoughts:

TMQ / TQM

Doing these as single instructions would require another pair for the megabyte part of the MAP offset: TMMQ / TQMM
Personally I'd be satisfied with just the TMQ and TMMQ (reading) instructions, and keep MAP/EOM how they are. TQM / TQMM are hazardous when the offset-megabyte is changing, because there is no interrupt management between TQM and TQMM as there is with ... MAP ... MAP ... EOM. But I like symmetry in instruction sets, so it might be worth having TQM / TQMM for completeness, with warnings about their use.
Another option suggested for this use case is PHM / PLM, which operate on all eight bytes at a time. These would be atomic, with no inconsistent state for interrupts, and so might be easier to use than MAP/EOM. It's also likely that any reader of MAP might want to push them to the stack anyway, as part of a recursive system context switch.

RTSM

This makes me wonder about an atomic MAP-with-JMP operation that uses the stack. Consider: RTSM is an instruction that pulls a two-byte PC value, pulls an eight-byte MAP value, sets the MAP, then sets the PC.

Currently, it is impractical to change the map for the region containing the code that is changing the map (where the PC is). Programs that use more than one memory map need a "dispatch" area of the code in a reserved region for this purpose. With RTSM, any code can push a new MAP, push a new PC, then RTSM to change MAP and PC at the same time, without pulling the rug out from under itself.

map_a: !32 $xxxxxxxx, $xxxxxxxx
map_b: !32 $xxxxxxxx, $xxxxxxxx

* = $2000
; With MAP = map_a...
  ldq map_b
  phq
  phw #routine
  rtsm

* = $2000
; With MAP = map_b...
routine:
  ...

Doing it via an RTS-like instruction also allows for doing this with subroutines. This could be accomplished with just PHM and RTSM:

* = $2000
; With MAP = map_a...
  phm
  phw #returnlabel
  ldq map_b
  phq
  phw #subroutine
  rtsm
returnlabel:
  ...

* = $2000
; With MAP = map_b...
subroutine:
  ...
  rtsm

One could imagine a "JSRM" that does this more simply, but it's not clear how it should accept the MAP argument. 10 bytes of immediate? Pulls the new MAP from the stack, and takes the PC immediate? Combinations of taking either argument as a pointer (...)? Note that the subroutine being called would have to know it was called this way and must return with RTSM, not just RTS, but maybe that's acceptable.

* = $2000
; With MAP = map_a...
  ldq map_b
  phq
  jsrm subroutine_via_jsrm
  ...

; Or maybe...
  jsrm (map_b), subroutine_via_jsrm

* = $2000
; With MAP = map_b...
subroutine_via_jsrm:
  jsr local_subroutine
  rtsm

local_subroutine:
  ...
  rts

PHQ / PLQ

I concur that it's not much of a benefit. It's a usability advantage in assembly language code: I do this all the time, and doing it as a single line in my assembler saves me from accidentally doing the pha/x/y/z plz/y/x/a out of order. But CPUs don't tend to have instructions that are merely shortcuts for other instructions of similar cost. That's what assembler macros are for. (I have pull32 and push32 macros in my current project.)
But if NEG NEG PHA is otherwise unused and we have FPGA space for convenience, go for it. I'll use them. :)

ZEROQ

If ZEROQ-Implied is the only addressing mode, then I don't think we need both ZEROQ-Implied and LDQ #imm8. Just LDQ #0.
That said, ZEROQ ZP, ZEROQ Addr, ZEROQ ZP,X, and ZEROQ Addr,X would be amazing, especially paired with INQ and DEQ. If we can do these, then also include ZEROQ-Implied for completeness.

lydon42 commented 2 weeks ago

ZERQ (five is to long) with memory access is not in the sense of the cpu I think. You rather would ZERQ/LDQ #0 and then do the STQ to memory, which is already there. I can't think of an other 6502 based opcode that would store a predefined value into memory. That would always come from registers.

dansanderson commented 2 weeks ago

I think ZERQ to set CPU registers to a predefined value is similarly weird. I suggested ZERQ could write directly to memory because INQ already does this: it does not affect Q in those addressing modes. So if ZERQ makes sense for Q, I think it would make sense in those other modes. If it doesn't make sense for memory, then I don't think it makes sense for Q.

In general, it would be useful to have just a few more ways to manipulate 32-bit values in memory without nuking all of the CPU registers. If I want to update a 32-bit value in memory but I have a CPU register in use, I either have to do it the long way and funnel each byte through a single register, or I have to push my active CPU registers and use Q. The existing INQ register is one way to modify a 32-bit value in memory without disturbing CPU registers. DEQ and ZERQ ZP/Addr would be two new ways that I think pair well with INQ.

lydon42 commented 2 weeks ago

Hmmm... Isn't that the same?

LDQ #0
STQ (zp)/$aaaa/$aa
DEQ (zp)/$aaaa/$aa

That way we don't need to add a set zero that we currently don't have.

dansanderson commented 2 weeks ago

Yes, I said this. ZEROQ wasn't my idea. :) ZEROQ Q = LDQ #0, or LDA #0 : TAX : TAY : TAZ. ZEROQ Addr = LDQ #0 : STQ Addr.

I also said: it would be useful to have just a few more ways to manipulate 32-bit values in memory without nuking all of the CPU registers. Right now, any use of Q requires:

Pushing and pulling the accumulator and index registers on the stack
Stashing the accumulator and index registers on the ZP
Only using Q at moments when none of the CPU registers are assigned a purpose
Avoiding Q entirely

That makes Q difficult to use in tight loops and call patterns, such as a 32-bit loop index. If ZEROQ is useful at all, it would be useful to give it addressing modes symmetric with INQ and DEQ. If there's an objection to an instruction encapsulating the number zero, then ZEROQ Q is similarly objectionable.

For what it's worth, LDQ #Imm8 also feels weird for the same reason. I understand the operand width constraint, and maybe it's a bit more useful than ZEROQ Q : LDA #Imm8, but it feels like it encapsulates zeroes as much as ZEROQ does. Sign extension is an interesting idea but with no way to disable it I think the semantics are weird, as if LDQ #Imm8 is primarily intended for setting a 32-bit register to a value in -128 to 127, and as a side effect can fill Q with zeroes or ones.

dansanderson commented 2 weeks ago

I have moved my JMPM/JSRM/RTSM idea to a separate feature request. https://github.com/MEGA65/mega65-core/issues/819 I like it and don't want it to get lost in discussion of other Q operations.

gardners commented 2 weeks ago

let all invalid extended opcodes BRK instead? (dan)

I'd instead suggest that they should all trigger an "reserved instruction hypertrap", so that people don't go using them as BRK in the the meantime.

Rhialto commented 2 weeks ago

And how does the user know that the trap has triggered? I vote for showing an animation of nasal demons[1]...

[1] http://catb.org/jargon/html/N/nasal-demons.html

dansanderson commented 2 weeks ago

🤦‍♂️ There's already a DEQ. Please remove this from the list. 😅

MEGA65 / mega65-core