Incorrect multiplication behavior on last cycle prior to next multiplication/division

KungFuFurby commented 2 years ago

Forwarded from the NESDev forums. Tested on my end using the latest master commit, 6fc6bf14a39d32dab69c4f9687a81df26d412758 .

According to regiscaelus...

On write to WRMPYB ($4203), always clear RDMPY ($4216-$4217), but only start a new multiplication if the current multiplication is complete (> 8 cycles).
RDDIV ($4214-$4215) is updated with WRMPYA ($4202) & WRMPYB ($4203) if multiplication is complete or on the last cycle (>7 cycles).
On cycle 8, multiplication should not be started and RDDIV ($4214-$4215) should only get updated once with WRMPYA ($4202) & WRMPYB ($4203.
On write to WRDIVB ($4206),

It looks like the cycle 16 case is not emulated correctly on division, and the cycle 8 case on multiplication is also not emulated correctly.

Two test ROMs were provided: one by Myself086, and one by Undisbeliever.

Max833 commented 2 years ago

Unfortunatly, this project is pretty much dead. But thank you for the report. Maybe ares can do something with that info.

carmiker commented 2 years ago

I will take a look tonight and see if I can come up with a working solution.

KungFuFurby commented 2 years ago

Sure! Then this fix can be ported to the other two forks (particularly ares, since that's still actively maintained to my knowledge).

carmiker commented 2 years ago

Here is a draft solution against the master branch of this repo which could probably be cleaned up a bit, but it appears to have correct behaviour. If anyone would like to test this to make sure it does not introduce regressions, that would be wonderful. I also have a solution for bsnes-jg which is posted on the nesdev forum (pending approval).

diff --git a/bsnes/sfc/cpu/cpu.hpp b/bsnes/sfc/cpu/cpu.hpp
index 19a486be..bf68bea3 100644
--- a/bsnes/sfc/cpu/cpu.hpp
+++ b/bsnes/sfc/cpu/cpu.hpp
@@ -163,7 +163,9 @@ private:

   struct ALU {
     uint mpyctr = 0;
+    uint mpylast = 0;
     uint divctr = 0;
+    uint divlast = 0;
     uint shift = 0;
   } alu;

diff --git a/bsnes/sfc/cpu/io.cpp b/bsnes/sfc/cpu/io.cpp
index 3338051c..99d81791 100644
--- a/bsnes/sfc/cpu/io.cpp
+++ b/bsnes/sfc/cpu/io.cpp
@@ -157,8 +157,10 @@ auto CPU::writeCPU(uint addr, uint8 data) -> void {
     io.rddiv = io.wrmpyb << 8 | io.wrmpya;

     if(!configuration.hacks.cpu.fastMath) {
-      alu.mpyctr = 8;  //perform multiplication over the next eight cycles
-      alu.shift = io.wrmpyb;
+      if (!alu.mpylast) {
+        alu.mpyctr = 8;  //perform multiplication over the next eight cycles
+        alu.shift = io.wrmpyb;
+      }
     } else {
       io.rdmpy = io.wrmpya * io.wrmpyb;
     }
@@ -176,12 +178,14 @@ auto CPU::writeCPU(uint addr, uint8 data) -> void {
     io.rdmpy = io.wrdiva;
     if(alu.mpyctr || alu.divctr) return;

-    io.wrdivb = data;
-
     if(!configuration.hacks.cpu.fastMath) {
-      alu.divctr = 16;  //perform division over the next sixteen cycles
-      alu.shift = io.wrdivb << 16;
+      if (!alu.divlast) {
+        io.wrdivb = data;
+        alu.divctr = 16;  //perform division over the next sixteen cycles
+        alu.shift = io.wrdivb << 16;
+      }
     } else {
+      io.wrdivb = data;
       if(io.wrdivb) {
         io.rddiv = io.wrdiva / io.wrdivb;
         io.rdmpy = io.wrdiva % io.wrdivb;
diff --git a/bsnes/sfc/cpu/timing.cpp b/bsnes/sfc/cpu/timing.cpp
index e443e5a5..3df7e35b 100644
--- a/bsnes/sfc/cpu/timing.cpp
+++ b/bsnes/sfc/cpu/timing.cpp
@@ -142,13 +142,19 @@ auto CPU::scanline() -> void {
 auto CPU::aluEdge() -> void {
   if(alu.mpyctr) {
     alu.mpyctr--;
+    if (!alu.mpyctr)
+      alu.mpylast = 1;
     if(io.rddiv & 1) io.rdmpy += alu.shift;
     io.rddiv >>= 1;
     alu.shift <<= 1;
   }
+  else
+    alu.mpylast = 0;

   if(alu.divctr) {
     alu.divctr--;
+    if (!alu.divctr)
+      alu.divlast = 1;
     io.rddiv <<= 1;
     alu.shift >>= 1;
     if(io.rdmpy >= alu.shift) {
@@ -156,6 +162,8 @@ auto CPU::aluEdge() -> void {
       io.rddiv |= 1;
     }
   }
+  else
+    alu.divlast = 0;
 }

 auto CPU::dmaEdge() -> void {

orbea commented 2 years ago

For reference here is the fix for the jgemu bsnes fork.

https://gitlab.com/jgemu/bsnes/-/commit/ff43a0d611ae99e306af6050a9de062fcf7bc043

bsnes-emu / bsnes

Incorrect multiplication behavior on last cycle prior to next multiplication/division #256