MEGA65 / mega65-rom-public

MEGA65 ROM public issue reporting
4 stars 0 forks source link

5211/193 = 27, INT(5211/193) = 26 — but not in Xemu #101

Closed dansanderson closed 1 month ago

dansanderson commented 10 months ago

Test Environment (required) You can use MEGA65INFO to retrieve this.

Describe the bug BASIC65 has very slightly different division behavior on real hardware vs. Xemu, because it uses MEGA65 hardware math registers for division. This is barely noticeable and not a problem on its own. However, INT() will notice these slight differences in rare cases and produce what looks like an incorrect result.

To Reproduce

?5211/193

prints 27 for both the MEGA65 core and Xemu.

?INT(5211/193)

prints 26 (incorrect) with the MEGA65 core, and 27 (correct) for Xemu.

Expected behavior INT(5211/193) should be 27 for both the core and Xemu.

Additional context Exact decimal division is difficult for small computers, and I wouldn't expect the hardware division register to know that 5211/193 has no remainder. Furthermore, it would take significant effort to get Xemu to emulate the core's division register with matching imprecision, and I don't know if it's worth getting it perfect.

In this case, BASIC's attempt to use INT() as a floor results in a different result with each implementation, likely based on a small difference between the core's and Xemu's hardware division implementations. I'm filing this as a ROM bug because ideally we would never see ?X print an integer-like rounded result and ?INT(X) print something entirely different. Maybe the conceptual equivalent of INT(6.99999999999) should be 7 if and only if BASIC would print 6.99999999999 as 7.

Or maybe it shouldn't, and we should tolerate this edge case.

dansanderson commented 10 months ago

For what it's worth, the C64 prints 27, possibly out of coincidence based on its own division routine but still an argument that BASIC65 should match.

lgblgblgb commented 10 months ago

As far as I can feel, if it's the hw div unit what BASIC65 uses (so the C64 is correct, since it does not use hw div for that but sw floating point), so isn't this more a core issue than a ROM issue then?

johnwayner commented 10 months ago

Just to confirm that ?(5211/193) - 26 = 0.99999999 so I think @dansanderson is spot on. In Vice c128 it results in 1.

dansanderson commented 10 months ago

Yes, improving the hardware division implementation might be another option. I don't know in what way it needs to be improved: BASIC has its own interests in rounding behaviors, and those may not be the same interests as others using the hardware register directly. Xemu might be using a higher performance method, but that method may be difficult to implement in the core. Math is hard. :)

dansanderson commented 10 months ago

... More to the point of this bug, I would argue that INT is what needs improving because BASIC already treats 5211/193 as 27 in other contexts, so INT should treat it that way also. Getting Xemu to match the core, or the core to match Xemu, is an orthogonal concern.

johnwayner commented 10 months ago

I used the monitor to put 0x145B (5211) and 0xC1 (193) into MULTINA and MULTINB. This resulted in:

johnwayner commented 10 months ago

Just a little more info here. Although the CPU can provide a nice, almost round, result of 27 when the values are put directly into MULTINA and MULTINB, BASIC does not do this due to how it handles floating point numbers. Instead the mantissas of the normalized versions these numbers are used. Mathematically, you can divide the mantissas regardless of the two numbers having different exponents and still get a correct mantissa (with the exponents being handled separately). However, the result of this ends up being a number smaller than one which is normalized up and, thus, some precision is lost.

It seems like it might be possible to tune the two numbers to take better advantage of the hardware since, I think, the result will never use most of the integer part of DIVOUT when doing division. But it also sounds very risky without a comprehensive suite of floating point tests.

Also, 5211/193 only looks like it equals 27 because BASIC is rounding 26.99999999 (ish) up to 27 to print it.

?26.999999  rem 6 decimal digits
26.999999

?26.9999999  rem 7 decimal digits
27

Btw, on C128:

?26.99999999  rem 8 decimal digits
26.99999999

?26.999999999  rem 9 decimal digits
27

So maybe there are two things going on here:

  1. Precision lower that might be possible
  2. Print not printing as many digits as previous BASICs
Rhialto commented 7 months ago

Also, 5211/193 only looks like it equals 27 because BASIC is rounding 26.99999999 (ish) up to 27 to print it.

Huh? 193 27 = 5211 so the division should give exactly* 27, no excuse possible.

dansanderson commented 7 months ago

@Rhialto Wikipedia: https://en.wikipedia.org/wiki/Floating-point_arithmetic A famous but pretty dense small book on the subject: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Rhialto commented 7 months ago

Interesting reference (the second one) but I know how floating point works. My argument is that all the involved numbers are representable exactly in FP, and all of them are pretty close to the middle of the range of representable numbers. There is (or should be) no overflow, underflow, loss of precision or rounding anywhere.

Going by the remark of https://github.com/MEGA65/mega65-rom-public/issues/101#issuecomment-1841857513 I did the division by hand: I determined the floating point format (mantissa and exponent) of all numbers, and divided the mantissas. This shows also an exact outcome with no rounding or loss of precision anywhere.

5211 = 0001 0100 0101 1011 = 1,010001011011 e=12 193 = 0000 0000 1100 0001 = 1,1000001 e=7 expected outcome: 27 = 0000 0000 0001 1011 = 1,1011 e=4

5211 / 193: via 1,010001011010 / 1,1000001 e = 12-7=5

    1,1 0 0 0 0 0 1 / 1,0 1 0 0 0 1 0 1 1 0 1 1 \ 0,1 1 0 1 1  e=5
                      1,1 0 0 0 0 0 1            x 0
              ---------------  
                      1,0 1 0 0 0 1 0 1         
                        1,1 0 0 0 0 0 1          x 1
              ----------------- -
              0 1 0 0 0 0 1 0 0 1        
                          1,1 0 0 0 0 0 1        x 1
              ------------------- -
                0 0 1 0 0 1 0 0 0 0    
                            1,1 0 0 0 0 0 1      x 0
              ---------------------
                  0 1 0 0 1 0 0 0 0 1  
                              1,1 0 0 0 0 0 1    x 1
              ----------------------- -
                    0 0 1 1 0 0 0 0 0 1  
                                1,1 0 0 0 0 0 1  x 1
              -------------------------
                        0 0 0 0 0 0 0 0

The result of the division is 0,11011 which is exactly the expected bit pattern, and it is reached well within the 32 bits precision that the mantissa has (32 - 1 sign bit + 1 implicit bit). I'll handwave the position of the comma in there; normalizing the number and moving the comma 1 position to the right decreases the exponent from 5 to the expected 4.

So my conclusion would be that there is a bug in the ROMs division algorithm. If it indeed puts the mantissas into the hardware divider, my first suspicion would be that it does something wrong with the implicit 1, bit, which isn't stored in the mantissa. It could maybe also do something wrong with the alignment of the mantissas, but then you would expect an off by a factor of 2 or something like that.

Edit: I just re-read https://github.com/MEGA65/mega65-rom-public/issues/101#issuecomment-1837620891 . If that comment is indeed saying that the hardware divider gives a non-integer result from this division (of the original integers), then there is something wrong with the hardware divider!

dansanderson commented 7 months ago

@Rhialto Very nice! Thanks for doing the work. (And sorry, I didn't mean to come off as patronizing with those links. ;) )

This uses the hardware registers to divide 5211 by 193:

!cpu m65
!to "tdiv.prg", cbm

* = $2001

!8 $12,$20,$0a,$00,$fe,$02,$20,$30,$3a,$9e,$20
!pet "$2014"
!8 $00,$00,$00

multina = $d770
multinb = $d774
mathbusy = $d70f
divquotient = $d76c
divremain = $d768

  ldq dividend
  stq multina
  ldq divisor
  stq multinb
loop:
  bit mathbusy
  bmi loop
  ldq divquotient
  stq $1804
  ldq divremain
  stq $1800
  brk           ; type M1800 in the monitor to see the result

dividend:
!16 5211,0
divisor:
!16 193,0

The result is DIVOUT whole (D76C-F) of 1B 00 00 00 = +27 and DIVOUT frac (D768-B) of 03 00 00 00 = +3 != 0. So if the hardware registers ought to be capable of dividing 5211/193 exactly, then there's a core bug. Notably the result is slightly more than 27, not slightly less.

The ROM is doing more work to translate between its internal floating point and the hardware register fixed point. Whether this is exacerbating the hardware math error or introducing its own error independently I'm not sure, would have to step through it. PRINT and INT appear to be handling floating point precision in different ways, which is why this is still an open ROM issue until we figure out what ought to be happening.

lydon42 commented 1 month ago

Fixed with latest development core.