SDL-Hercules-390 / SoftFloat

Berkeley IEEE Binary Floating-Point Library for use by the SDL-Hercules-390 emulator
7 stars 5 forks source link

SoftFloat needs updated to Release 3e #1

Closed RallyTronics closed 6 years ago

RallyTronics commented 6 years ago

The official Berkeley SoftFloat web page announces as/of 2018 June 2 the availability of Release 3e of SoftFloat, containing important fixes to the square root function:

Releases 3 through 3c of Berkeley SoftFloat contain bugs in the square root functions that may be of concern for some uses. Those bugs are believed to be repaired in Release 3d and later ...

These changes need to be incorporated into this version of Hercules's SoftFloat external package.

Fish-Git commented 6 years ago

@srorso

Steve,

I've co-assigned this issue to you as well, as I would like you to review my work(*) once I'm finished.

I hope that's okay.

(*) I'm currently already working on it and am almost done.

Fish-Git commented 6 years ago

@srorso

Steve,

I've committed my changes to a personal test repository so that you and I can work together to make sure everything looks good to go before making the live commit to the real repository.

The test repository (with all of the SoftFloat-3e changes) is called "sf3e-test" under my personal "Fish-Git" account:

Please clone it (or download it or whatever) and look it over for me (especially the 'hercsource' files) to make sure I didn't screw anything up. (It builds fine but I haven't done any testing on it yet.)

If you need to make any changes let me know and I'll add you as a collaborator so you can have commit and push access.

Thanks!

  (*) I've put  " DO NOT USE! "  in the description to prevent anyone from mistakenly trying to use it for anything until we can together verify everything looks good (at which point I will apply the changes to the real SDL-Hercules-390/SoftFloat repository and delete the test repository).

srorso commented 6 years ago

Hi Fish,

Today's a travel day, so I can take a look tomorrow.

Best Regards, Steve Orso

Fish-Git commented 6 years ago

Today's a travel day, so I can take a look tomorrow.

Thanks.

I very much need some help on this!

I ran my first test today, and it failed spectacularly. :(

Did 22 tests.  17 failed; 5 OK:

bfp-001-divtointtst    1174 OK compares  411 failures  (DIEBR, DIDBR)
bfp-002-loadrtst        201 OK compares  194 failures  (LEDBR, LEDB, LEXBR, LEXB, LDXBR, LDXB)
bfp-004-cvttologtst     160 OK compares   51 failures  (CLFEBR, CLFDBR, CLFXBR)
bfp-005-cvttolog64tst   212 OK compares   59 failures  (CLGEBR, CLGDBR, CLGXBR)
bfp-006-cvttofixtst     236 OK compares   59 failures  (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
bfp-007-cvttofix64tst   332 OK compares   55 failures  (CGEBR, CGBRA, CGDBR, CGDBRA, CGXBR, CGXBRA)
bfp-008-cvtfrlogtst      32 OK compares   17 failures  (CELFBR, CDLFBR, CXLFBR)
bfp-009-cvtfrlog64tst    48 OK compares   25 failures  (CELGBR, CDLGBR, CXLGBR)
bfp-010-cvtfrfixtst      38 OK compares   18 failures  (CEFBR, CEFBRA, CDFBR, CDFBRA, CXFBR, CXFBRA)
bfp-011-cvtfrfix64tst    71 OK compares   48 failures  (CEGBR, CEGBRA, CDGBR, CDGBRA, CXGBR, CXGBRA)
bfp-014-dividetst       592 OK compares   53 failures  (DEBR, DEB, DDBR, DDB, DXBR)
bfp-015-sqrttst         105 OK compares   26 failures  (SQEBR, SQEB, SQDBR, SQDB, SQXBR)
bfp-016-addtst          837 OK compares  180 failures  (AEBR, AEB, ADBR, ADB, AXBR)
bfp-018-subtracttst     834 OK compares  183 failures  (SEBR, SEB, SDBR, SDB, SXBR)
bfp-019-multiplytst     640 OK compares   89 failures  (MEEBR, MEEB, MDBR, MDB, MXBR)
bfp-021-multaddtst     2607 OK compares  101 failures  (MAEBR, MAEB, MADBR, MADB)
bfp-022-multsubtst     2607 OK compares  101 failures  (MSEBR, MSEB, MSDBR, MSDB)

bfp-003-loadfpitst      362 OK compares  All pass      (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
bfp-012-loadtesttst      79 OK compares  All pass      (LTEBR, LTDBR, LTXBR, TCEB, TCDB, TCXB)
bfp-013-compstst        385 OK compares  All pass      (CEBR, CEB, CDBR, CDB, CXBR, KEBR, KEB, KDBR, KDB, KXBR)
bfp-017-loadltst        170 OK compares  All pass      (LDEBR, LDEB, LXEBR, LXEB, LXDBR, LXDB)
bfp-020-multlongertst   513 OK compares  All pass      (MDEBR, MDEB, MXDBR, MXDB)

  It's obvious one or more of the 'hercsource' files is not quite right. I'll try to take a closer look at them to see if anything obvious jumps out at me, but since you were the original author and spent a lot of time working on the code, I suspect you'll be able to find the problem much faster than me.

Thanks.

  (p.s. There's absolutely no rush on this! Take however long you need!)

srorso commented 6 years ago

Hi Fish,

I ran my first test today, and it failed spectacularly. :(

Maybe less spectacularly than you suspect. SoftFloat-3 often fails this way. It's architected, as you know, to confine rounding and normalization to common routines, one per precision.

Looking at the scripts with no failures, none of them require rounding, and all of the scripts with failing cases require rounding for at least some of their cases. So I suspect a small problem in one or more rounding routines. And with floating point, the devil is not in the details, it's in the rounding. And in the IEEE exception handling, but I think that's ok; the scripts with 100% pass include IEEE exceptions. (And, for IBM, scaled results, but a scaling issue generates far fewer fails.)

It may be more helpful for me to have a look at the Hercules allTests.out file before poking about in the code. If you have a moment, would you post it to this issue? Zip, tar.gz, txt, whatever works for you.

Best Regards, Steve Orso

RallyTronics commented 6 years ago

Not sure if this will help you guys out but I wrote a natvis file so that you can view SoftFloat and SoftDouble in the Visual Debugger as floating point values.

https://github.com/RallyTronics/SoftFloat-NATVIS

John Lagerquist Chief Engineer RallyTronics LLC 801-866-5981

Fish-Git commented 6 years ago

@srorso

It may be more helpful for me to have a look at the Hercules allTests.out file before poking about in the code. If you have a moment, would you post it to this issue? Zip, tar.gz, txt, whatever works for you.

(Doh!) Sorry about that.

On my first attempt to post my test results I accidentally closed my browser window before clicking the big green GitHub "Comment" button so I lost everything I typed. (Doh!)

Then when I tried again, I posted just the summary but forgot to upload the files! (Doh! again!)

Here are the missing runtest files:

I do remember getting a compiler error in one of the hercsource rounding functions on my first attempt to build the new code (due to some variable related to rounding no longer existing, having been replaced with a different set of variable names in the new 3e code), and I may have "fixed it" incorrectly (choosing the wrong new code replacement variable). I wish I could remember which file it was. :(

I'll try to determine which file it was and let you know. It may be a clue.

I did notice that the rounding and other softfloat functions that the 'hercsource' functions were based on did change quite a bit in 3e, so the special IBM_IEEE herc code in each of them might need to be significantly changed in order to work properly with the new 3e logic. I'm hoping you will be able to figure that out better than me though.

Thanks for your help on this, Steve.

I'm afraid I'm completely lost. :(

Fish-Git commented 6 years ago

I'll try to determine which file it was and let you know. It may be a clue.

Found it. The file was f128_roundToInt.c.

The original code sequence was:

    union ui128_f128 uA;
    uint_fast64_t uiA64, uiA0;
    int_fast32_t exp;
    struct uint128 uiZ;
    uint_fast64_t lastBitMask, roundBitsMask;
    bool roundNearEven;
    union ui128_f128 uZ;

[...]

        /*--------------------------------------------------------------------
        *--------------------------------------------------------------------*/
        uiZ.v64 = uiA64;
        uiZ.v0  = 0;
        lastBitMask = (uint_fast64_t) 1<<(0x402F - exp);
        roundBitsMask = lastBitMask - 1;
        if ( roundingMode == softfloat_round_near_maxMag ) {
            uiZ.v64 += lastBitMask>>1;
        } else if ( roundingMode == softfloat_round_near_even ) {
            uiZ.v64 += lastBitMask>>1;
            if ( ! ((uiZ.v64 & roundBitsMask) | uiA0) ) {
                uiZ.v64 &= ~lastBitMask;
            }
#ifdef IBM_IEEE
        } else if ((roundingMode == softfloat_round_stickybit) && (uiZ.v64 & roundBitsMask) | uiA0 ) {
            uiZ.v64 |= lastBitMask;
#endif

  The new 3e code sequence is now:

    union ui128_f128 uA;
    uint_fast64_t uiA64, uiA0;
    int_fast32_t exp;
    struct uint128 uiZ;
    uint_fast64_t lastBitMask0, roundBitsMask;
    bool roundNearEven;
    uint_fast64_t lastBitMask64;
    union ui128_f128 uZ;

[...]

        /*--------------------------------------------------------------------
        *--------------------------------------------------------------------*/
        uiZ.v64 = uiA64;
        uiZ.v0  = 0;
        lastBitMask64 = (uint_fast64_t) 1<<(0x402F - exp);
        roundBitsMask = lastBitMask64 - 1;
        if ( roundingMode == softfloat_round_near_maxMag ) {
            uiZ.v64 += lastBitMask64>>1;
        } else if ( roundingMode == softfloat_round_near_even ) {
            uiZ.v64 += lastBitMask64>>1;
            if ( !((uiZ.v64 & roundBitsMask) | uiA0) ) {
                uiZ.v64 &= ~lastBitMask64;
            }
#if defined( IBM_IEEE )
        } else if ((roundingMode == softfloat_round_stickybit) && (uiZ.v64 & roundBitsMask) | uiA0 ) {
            uiZ.v64 |= lastBitMask64;
#endif

  I had to change lastBitMask to lastBitMask64 in the IBM_IEEE section.

At first I thought maybe it should be changed to lastBitMask0, but that just didn't seem right, so I changed it to lastBitMask64 instead.

I hope that's right.

Anyway, if you take a look at the old 3a version of this file as compared to the new 3e version of this file, you can see the significant changes that were made in the new 3e release. Virtually all of the 'hercsource' files changed just as significantly, so they all should be closely reviewed by someone who knows what they're doing.

I'd do it myself but I'm afraid I don't know what I'm doing. Floating point is a real mystery to me and every time I try to wrap my head around it I get a headache. :(

I really appreciate any help you can provide, Steve.

Thanks.

srorso commented 6 years ago

Hi John,

Thanks for your contribution; it may come in handy at some point.

Binary floating point rounding and exceptions, though, are defined in terms of the binary representations of the sign, exponent and significand, not the decimal approximation of the stored binary value. It took me some time to realize this, but once I did, things progressed more quickly. This is also the reason the BFP test programs represent their source data as hex constants instead of decimal; the quest was for test values that would create an "interesting" result with respect to rounding, exceptions, and scaling. In some cases, the input test values were chosen to create specific bit patterns in the guard and sticky bits used internally by SoftFloat but not included in a given result.

There are a number of web sites that offer decimal<->binary floating point conversion. My favorite is

https://babbage.cs.qc.cuny.edu/IEEE-754/

It supports enough rounding modes to let you get a feel for rounding.

Play with the BFP representation of 0.1 and you'll see what's happening. 0.1 has a repeating bit pattern in the significand and the rounding mode affects the result of a decimal to binary conversion.

Fish-Git commented 6 years ago

Here are some exported ExamDiff Pro reports of those 'hercsource' files which are Hercules IBM_IEEE customized versions of corresponding softfloat files:

As I recall I did have to make some minor adjustments to some of these files too.

In any case, these are likely the files that need to be closely reviewed to make sure they're still right with respect to the latest 3e version of SoftFloat.

Hopefully they'll help you to more quickly and easily find the bug.

RallyTronics commented 6 years ago

Thanks Stephen, that is a very helpful link!

srorso commented 6 years ago

Hi Fish,

In file s_roundPackToF32.c, please do the following:

1) delete line 170 (#endif)

2) Add the following line just before line 177 (176 after the deletion):

endif / IBM _IEEE /

Also, Hercsource functions softfloat_roundPackToUI32() and softfloat_roundPackToUI64() have been renamed in SoftFloat-3e. This means only functions within hercsource use these functions. The balance of SoftFloat uses similar functions with no Hercules modifications. The Hercules mods in softfloat_roundPackToUI32() and softfloat_roundPackToUI64() should be ported to softfloat_roundToUI32() and softfloat_roundToUI64() respectively and the ports should be placed in hercsource.

If you get a significantly reduced error count, please repost the allTest.txt file; should not need the others. Actually, because your using a temp repo, you could just commit the test results files.

Many thanks!

Best Regards, Steve Orso

Fish-Git commented 6 years ago

In file s_roundPackToF32.c, please do the following:

Oops! I guess I forgot to invite you as a collaborator so you can make commits yourself. Let me fix that.

Invitation sent!

Also, Hercsource functions softfloat_roundPackToUI32() and softfloat_roundPackToUI64() have been renamed in SoftFloat-3e. This means only functions within hercsource use these functions. The balance of SoftFloat uses similar functions with no Hercules modifications.

Well THAT would certainly explain a lot!

The Hercules mods in softfloat_roundPackToUI32() and softfloat_roundPackToUI64() should be ported to softfloat_roundToUI32() and softfloat_roundToUI64() respectively and the ports should be placed in hercsource.

Thanks. I'll give it a whirl and let you know how it goes.

If you get a significantly reduced error count, please repost the allTest.txt file; should not need the others. Actually, because your using a temp repo, you could just commit the test results files.

10-4. I'll do that.

Thanks!

Fish-Git commented 6 years ago

@srorso

Steve,

Should the same change be made to file s_roundPackToF128.c too? I notice its #endif is before the packReturn: label instead of before the uiZ: label like the F32 and F64 files. Should it be changed too?

srorso commented 6 years ago

Hi Fish,

No, s_roundPackToF128 is good as is. My Hercules mods and Dr. Hauser's code for round to odd interacted in a bad way; poor design on my part. The 128-bit stuff in SoftFloat-3 (a-e) is way different from 32- and 64-bit. Both 32- and 64-bit significands fit with guard and sticky bits in a host uint on most systems nowadays; there is no need to split the significand, do arithmetic on the parts, and put the parts back together as is needed for 128.

Best Regards, Steve Orso

Fish-Git commented 6 years ago

The Hercules mods in softfloat_roundPackToUI32() and softfloat_roundPackToUI64() should be ported to softfloat_roundToUI32() and softfloat_roundToUI64() respectively and the ports should be placed in hercsource.

What about hercsource files s_roundPackToI32.c and s_roundPackToI64.c? It looks like I missed updating those too. They still contain the old 3a code for functions softfloat_roundPackToI32 and softfloat_roundPackToI64, which don't seem to be referenced anywhere in the new 3e code. Were they perhaps renamed as well? Which original softfloat source file/function were they originally Hercules modifications of?

It also looks like I completely missed updating hercsource files s_propagateNaNF32UI.c, s_propagateNaNF64UI.c and s_propagateNaNF128UI.c too! (Probably because the original softfloat files they were modifications of aren't in the main 'source' directory, but rather in the specialized 'source/8086-SSE' subdirectory instead). Each of them still contain the old, unmodified 3a code (instead of the new 3e code), which might also explain a lot! :)    (Oops!)

I'll fix them too and then re-test and let you know how it goes.

Fish-Git commented 6 years ago

It also looks like I completely missed updating hercsource files s_propagateNaNF32UI.c, s_propagateNaNF64UI.c and s_propagateNaNF128UI.c too! [...] which might also explain a lot!

Never mind. It's just cosmetic. The actual softfloat code for those 3 files didn't actually change any between 3a and 3e. False alarm.

srorso commented 6 years ago

What about hercsource files s_roundPackToI32.c and s_roundPackToI64.c?

Yup, same deal for those....They have been renamed too...

srorso commented 6 years ago
> line  7538: Storage compare mismatch.
                  Want: R:0000000000002340 00280000 F8002800 00180000 F8001000  "LDXBR FPCR pairs 9-10"
                  Got:  R:0000000000002340 00280000 F8002800 00100000 F8001000
> line  7542: Storage compare mismatch.
                  Want: R:0000000000002350 00180000 F8001000 00280000 F8002000  "LDXBR FPCR pairs 11-12"
                  Got:  R:0000000000002350 00100000 F8001000 00280000 F8002000

I'm looking at s_roundPackToF64. It's not reflecting inexact in the FPCR on non-trappable underflow. (The 8 in 0x00180000 is the IEEE inexact flag. The 1 is underflow, and tests 10 and 11 are underflow tests).

Lots of the 64-bit result rounding tests also fail with a missing inexact indication, so I suspect a common cause...

Fish-Git commented 6 years ago

Also, Hercsource functions softfloat_roundPackToUI32() and softfloat_roundPackToUI64() have been renamed in SoftFloat-3e. This means only functions within hercsource use these functions. The balance of SoftFloat uses similar functions with no Hercules modifications. The Hercules mods in softfloat_roundPackToUI32() and softfloat_roundPackToUI64() should be ported to softfloat_roundToUI32() and softfloat_roundToUI64() respectively and the ports should be placed in hercsource.

What about hercsource files s_roundPackToI32.c and s_roundPackToI64.c?

Yup, same deal for those....They have been renamed too...

Committed.

We're now doing a slightly less miserable version of lousy. :/

The same tests are still failing, but now they're failing slightly less often than before:

Did 22 tests.  17 failed; 5 OK.

Test bfp-001-divtoint    1581 OK compares    4 failures  (DIEBR, DIDBR)
Test bfp-002-loadr        305 OK compares   90 failures  (LEDBR, LEDB, LEXBR, LEXB, LDXBR, LDXB)
Test bfp-004-cvttolog     208 OK compares    3 failures  (CLFEBR, CLFDBR, CLFXBR)
Test bfp-005-cvttolog64   265 OK compares    6 failures  (CLGEBR, CLGDBR, CLGXBR)
Test bfp-006-cvttofix     284 OK compares   11 failures  (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-007-cvttofix64   380 OK compares    7 failures  (CGEBR, CGBRA, CGDBR, CGDBRA, CGXBR, CGXBRA)
Test bfp-008-cvtfrlog      43 OK compares    6 failures  (CELFBR, CDLFBR, CXLFBR)
Test bfp-009-cvtfrlog64    59 OK compares   14 failures  (CELGBR, CDLGBR, CXLGBR)
Test bfp-010-cvtfrfix      49 OK compares    7 failures  (CEFBR, CEFBRA, CDFBR, CDFBRA, CXFBR, CXFBRA)
Test bfp-011-cvtfrfix64    92 OK compares   27 failures  (CEGBR, CEGBRA, CDGBR, CDGBRA, CXGBR, CXGBRA)
Test bfp-014-divide       613 OK compares   32 failures  (DEBR, DEB, DDBR, DDB, DXBR)
Test bfp-015-sqrt         115 OK compares   16 failures  (SQEBR, SQEB, SQDBR, SQDB, SQXBR)
Test bfp-016-add          879 OK compares  138 failures  (AEBR, AEB, ADBR, ADB, AXBR)
Test bfp-018-subtract     876 OK compares  141 failures  (SEBR, SEB, SDBR, SDB, SXBR)
Test bfp-019-multiply     673 OK compares   56 failures  (MEEBR, MEEB, MDBR, MDB, MXBR)
Test bfp-021-multadd     2652 OK compares   56 failures  (MAEBR, MAEB, MADBR, MADB)
Test bfp-022-multsub     2652 OK compares   56 failures  (MSEBR, MSEB, MSDBR, MSDB)

Test bfp-003-loadfpi      362 OK compares   All pass     (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-012-loadtest      79 OK compares   All pass     (LTEBR, LTDBR, LTXBR, TCEB, TCDB, TCXB)
Test bfp-013-comps        385 OK compares   All pass     (CEBR, CEB, CDBR, CDB, CXBR, KEBR, KEB, KDBR, KDB, KXBR)
Test bfp-017-loadl        170 OK compares   All pass     (LDEBR, LDEB, LXEBR, LXEB, LXDBR, LXDB)
Test bfp-020-multlonger   513 OK compares   All pass     (MDEBR, MDEB, MXDBR, MXDB)

  Here's the complete runtest log:

And, in case it helps, here's the before/after picture:

I have a bad feeling that I royally screwed up porting one or more of the 'hercsource' files. I did have some trouble figuring our where the Hercules IBM_IEEE code should be with respect to the new 3e softfloat code on a few of them. Some of the functions changed quite significantly between 3a and 3e, and I strongly suspect I guessed wrong on more than one of them (the four recently committed files being among them). :(

If there's anything I can do to help, please let me know.

In the mean time I'll continue to stare at the modified 'hercsource' files and pretend I actually understand what they're doing while you stare at them and shake your head at my obvious mistakes. %P

Thanks for your help on this, Steve.

p.s. remember that you have commit access to this repository too, so rather than you tell me what changes need to be made and me screwing things up, it would probably be better for me to simply step aside and let you make the commits instead since you, unlike me, actually know what the frick you're doing.

srorso commented 6 years ago

Hi Fish,

The base code in SoftFloat-3 would be MUCH easier to understand if it had comments. I may have to take the position that I must fully re-comment any module that has been touched to support Hercules.

As for obvious mistakes, well, no, not so much. It took me about a day to find exactly why s_roundPackToF32() was not working. I wrote a c program to call f64_to_f32 directly, and then put a bunch of printfs in the Softfloat 3e code. It still took a while for the lightbulb to come on.

At the end of the day I decided that the 3e implementation of round to odd (aka sticky-bit rounding) is less than elegant, in part for including a go to for primary logic. (I can tolerate go to for emergency aborts, but not the way it's used here.)

The program also accesses the softfloat_raw global variables, which lets me see what SoftFloat is "thinking." The _raw variables are really used to create a scaled result when required; they contain the rounded result in target precision with an unbounded exponent and before the value has been packed into the target format. One gets some warnings about local name redefinitions from the C library, but those don't affect the diagnostic results.

I have attached a copy of the program; use the following command to compile:

cl ledbr.c sf3eb64.lib /link /NODEFAULTLIB:MSVCRT

I copy the SoftFloat headers and lib to the directory containing ledbr.c.

ledbr.c.txt

Thanks for the commit access. Right now I think commit bbb43ec needs to be ported from Hercules-390, and I suspect the "old style" platform.h include in 3e may be contributing some issues. Not sure about that. (See Hercules-390 commit d7d1229)

I shall continue to look at / work on LExBR; that instruction does nearly nothing except test out all of the rounding and packing stuff. Once that's right, lots of stuff gets better.

Don't be discouraged; your recent commit reduced the error rate from 13.47% to 5.40%, and the number of failing cases fell by 1,000. Divtoint went from 411 to 4 errors. The integer conversion instructions saw a similar decline. And although it's tempting to dive into the four remaining error cases, there's lots of weird stuff in that instruction's definition in the PoO. That instruction was tough.

Best Regards, Steve Orso

srorso commented 6 years ago

Hi Fish,

Regarding 4a35c4b in sf3e-test:

Starting with SoftFloat-3c, the lines at 82-83 in the original s_roundPackToF32 (sf3a line numbers) were reversed. This makes it impossible to use the s3fa Hercules modifications as is. Function s_roundPackToF64 is also affected; s_roundPackToF128 is not.

Compare the unmodified sf3a s_roundPackToF32 lines 82-83 with sf3e s_roundPackToF32 lines 98-99

The reversal prevented the return of inexact on any 32-bit or 64-bit operation that resulted in underflow (value too small to be represented in the target precision, usually because an exponent cannot be represented in the target precision's exponent bits. This resulted in numerous failed test cases that generate non-trappable underflows, with the returned Floating Point Control Register value being wrong.

It is not clear why the lines of code were reversed.

This commit should clear quite a number of the remaining errors.

Now I shall turn my attention to the instructions that convert between integer and floating point formats.

Best Regards, Steve Orso

Fish-Git commented 6 years ago

Compare the unmodified sf3a s_roundPackToF32 lines 82-83 with sf3e s_roundPackToF32 lines 98-99

Yes, I see it.

The if ( roundBits ) ... statement came before the sig = ... statement in 3a, but comes after it in 3e. Good catch!

 

This commit should clear quite a number of the remaining errors.

And indeed it does!

Here are the new results:

Test bfp-005-cvttolog64   265 OK compares   6 failures  (CLGEBR, CLGDBR, CLGXBR)
Test bfp-006-cvttofix     284 OK compares  11 failures  (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-007-cvttofix64   380 OK compares   7 failures  (CGEBR, CGBRA, CGDBR, CGDBRA, CGXBR, CGXBRA)
Test bfp-016-add          947 OK compares  70 failures  (AEBR, AEB, ADBR, ADB, AXBR)
Test bfp-018-subtract     944 OK compares  73 failures  (SEBR, SEB, SDBR, SDB, SXBR)

Test bfp-001-divtoint    1585 OK compares  All pass     (DIEBR, DIDBR)
Test bfp-002-loadr        395 OK compares  All pass     (LEDBR, LEDB, LEXBR, LEXB, LDXBR, LDXB)
Test bfp-003-loadfpi      362 OK compares  All pass     (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-004-cvttolog     211 OK compares  All pass     (CLFEBR, CLFDBR, CLFXBR)
Test bfp-008-cvtfrlog      49 OK compares  All pass     (CELFBR, CDLFBR, CXLFBR)
Test bfp-009-cvtfrlog64    73 OK compares  All pass     (CELGBR, CDLGBR, CXLGBR)
Test bfp-010-cvtfrfix      56 OK compares  All pass     (CEFBR, CEFBRA, CDFBR, CDFBRA, CXFBR, CXFBRA)
Test bfp-011-cvtfrfix64   119 OK compares  All pass     (CEGBR, CEGBRA, CDGBR, CDGBRA, CXGBR, CXGBRA)
Test bfp-012-loadtest      79 OK compares  All pass     (LTEBR, LTDBR, LTXBR, TCEB, TCDB, TCXB)
Test bfp-013-comps        385 OK compares  All pass     (CEBR, CEB, CDBR, CDB, CXBR, KEBR, KEB, KDBR, KDB, KXBR)
Test bfp-014-divide       645 OK compares  All pass     (DEBR, DEB, DDBR, DDB, DXBR)
Test bfp-015-sqrt         131 OK compares  All pass     (SQEBR, SQEB, SQDBR, SQDB, SQXBR)
Test bfp-017-loadl        170 OK compares  All pass     (LDEBR, LDEB, LXEBR, LXEB, LXDBR, LXDB)
Test bfp-019-multiply     729 OK compares  All pass     (MEEBR, MEEB, MDBR, MDB, MXBR)
Test bfp-020-multlonger   513 OK compares  All pass     (MDEBR, MDEB, MXDBR, MXDB)
Test bfp-021-multadd     2708 OK compares  All pass     (MAEBR, MAEB, MADBR, MADB)
Test bfp-022-multsub     2708 OK compares  All pass     (MSEBR, MSEB, MSDBR, MSDB)

(and the corresponding runtest files):

 

Now I shall turn my attention to the instructions that convert between integer and floating point formats.

Thank you for your help on this, Steve!

srorso commented 6 years ago

Hi Fish,

Could I trouble you to run a complete BFP test suite again? I have made three commits and as I look at the remaining issues, many may have been resolved by the commits. About the commits:

b76878a: sf3a used to return zero when any convert to integer function received a NaN or a value that would overflow the target integer. This can now be configured in specialize.h, and there is no need to have Hercules-specific code for that.

185b8d7: In sf3a, the Hercules-specific code was placed at the beginning of a block started by an if. In sf3e, the if controls a single line, not a block, and the Hercules-specific return was executed unconditionally rather than as part of the conditional block. Further, because of the configurability of specialize.h noted above, no Hercules-specific code was needed to deal with over/underflow. Hercules-specific code is still needed for Hercules-specific rounding modes.

51894a6: When rounding to a UI32, the number of guard digits plus sticky bit was increased from 7 to 12. This improves the accuracy of rounding somewhat, but it caused the test to set the SoftFloat incremented flag to always fail.

Many thanks!

Best Regards, Steve Orso

Fish-Git commented 6 years ago

Could I trouble you to run a complete BFP test suite again?

No problem!

I made a couple of my own commits beforehand however. They were very minor and did not (I hope!) change any actual logic from your prior commits.

The first was just cosmetic to remove some trailing blanks (yeah I'm somewhat anal about that), and the other was to delete f64_to_ui64.c from the 'hercsource' directory (and a corresponding adjustment to the cmake sources.txt file) so as to use the native softfloat version of that file/function instead. (I compared the resulting file from your commit and noticed it no longer contained any special Hercules IBM_IEEE code, so I compared it to the native softfloat version in the primary 'source' directory and saw that they were now identical. Thus the one in 'hercsource' is now no longer needed.)

Our new (latest and greatest) test results based on sf3e-test commit 740edee now looks like this:

Did 22 tests.  2 failed; 20 OK.

Test bfp-016-add         947 OK compares  70 failures (AEBR, AEB, ADBR, ADB, AXBR)
Test bfp-018-subtract    944 OK compares  73 failures (SEBR, SEB, SDBR, SDB, SXBR)

Test bfp-001-divtoint   1585 OK compares  All pass    (DIEBR, DIDBR)
Test bfp-002-loadr       395 OK compares  All pass    (LEDBR, LEDB, LEXBR, LEXB, LDXBR, LDXB)
Test bfp-003-loadfpi     362 OK compares  All pass    (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-004-cvttolog    211 OK compares  All pass    (CLFEBR, CLFDBR, CLFXBR)
Test bfp-005-cvttolog64  271 OK compares  All pass    (CLGEBR, CLGDBR, CLGXBR)
Test bfp-006-cvttofix    295 OK compares  All pass    (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-007-cvttofix64  387 OK compares  All pass    (CGEBR, CGBRA, CGDBR, CGDBRA, CGXBR, CGXBRA)
Test bfp-008-cvtfrlog     49 OK compares  All pass    (CELFBR, CDLFBR, CXLFBR)
Test bfp-009-cvtfrlog64   73 OK compares  All pass    (CELGBR, CDLGBR, CXLGBR)
Test bfp-010-cvtfrfix     56 OK compares  All pass    (CEFBR, CEFBRA, CDFBR, CDFBRA, CXFBR, CXFBRA)
Test bfp-011-cvtfrfix64  119 OK compares  All pass    (CEGBR, CEGBRA, CDGBR, CDGBRA, CXGBR, CXGBRA)
Test bfp-012-loadtest     79 OK compares  All pass    (LTEBR, LTDBR, LTXBR, TCEB, TCDB, TCXB)
Test bfp-013-comps       385 OK compares  All pass    (CEBR, CEB, CDBR, CDB, CXBR, KEBR, KEB, KDBR, KDB, KXBR)
Test bfp-014-divide      645 OK compares  All pass    (DEBR, DEB, DDBR, DDB, DXBR)
Test bfp-015-sqrt        131 OK compares  All pass    (SQEBR, SQEB, SQDBR, SQDB, SQXBR)
Test bfp-017-loadl       170 OK compares  All pass    (LDEBR, LDEB, LXEBR, LXEB, LXDBR, LXDB)
Test bfp-019-multiply    729 OK compares  All pass    (MEEBR, MEEB, MDBR, MDB, MXBR)
Test bfp-020-multlonger  513 OK compares  All pass    (MDEBR, MDEB, MXDBR, MXDB)
Test bfp-021-multadd    2708 OK compares  All pass    (MAEBR, MAEB, MADBR, MADB)
Test bfp-022-multsub    2708 OK compares  All pass    (MSEBR, MSEB, MSDBR, MSDB)

which I hope you'll agree now looks a lot better than before! :)    (Well done, Steve!)

All that's left are the add/subtract tests, which I have every confidence you will be able to quickly fix.

Thanks again for all your great help on this, Steve.

I am very grateful.

(and impressed as hell too!)

srorso commented 6 years ago

Hi Fish,

The first was just cosmetic to remove some trailing blanks

Not a problem; I try to do the same.

and the other was to delete f64_to_ui64.c from the 'hercsource' directory

Also not a problem. And f32_to_UI64.c and f128_to_UI64 can get the same treatment. Leverage of the customizations in specialize.h means the Hercules-specific stuff can just be removed, and the base sf3e routines can be used.

All that's left are the add/subtract tests, which I have every confidence you will be able to quickly fix.

One can only hope, but I shall not share your optimism until it's fixed. I have said in the past "this should be easy," only to have a sorta-kinda solution sorta-kinda working three weeks later. But going from 1670 failing compares to 143 failing compares is good progress.

Thanks for the quick turn around the updated runtest results.

Best Regards, Steve Orso

Fish-Git commented 6 years ago

And f32_to_UI64.c and f128_to_UI64 can get the same treatment.

Huh?   Are you sure?

Because both of them contain Herc modifications!

Or is that maybe what's causing our remaining failures??

Please take another look at both of those files and confirm for me whether or not you're absolutely sure both of them should fall back to using the native unmodified sf3e version as-is.

Thanks!

srorso commented 6 years ago

Hi Fish,

Huh? Are you sure?

Well, as sure as I am about many things. Certain, never.

Because both of them contain Herc modifications!

True. But those modifications just ensure that max_unit64 is returned on overflow. And sf3e does that by default now. (You will note that the non-FAST64 code returns zero; this is wrong and vestigial because SoftFloat always compiles with FAST64 when being built for Hercules. So the Hercules mods may be removed and the build changed to use unmodified source.

And if I am wrong, that is what the revert button is for. :-)

Or is that maybe what's causing our remaining failures??

Nope...got that figure out too, but it will take a bit more time to code a correction. Two issues:

1) In sf3a, s_addMagsF32 ( and 64-bit precision too) shifted the operand significands left 6 bits to make room on the right for guard and sticky bits and expected s_roundPackToF32 to shift the result signficand back (which it does). In sf3e, the operands are shifted only when the exponents are different. If they are the same, then the result is shifted before moving on to s_roundPackToF32.

Addition of sigZ <<= 6; following line 119 in s_addMagsF32 addresses this issue. (Verified on 32-bit addition.) But this change may not be needed...but read on.

2) (not detected in test cases, may be an SNO): Code to save raw information for later retrieval of scaled results is now in the wrong place. sf3e reordered the tests for tiny and non-finite at lines 91-99, with the result that raw information is never stored. I need to look at the PO to see if underflow, and therefore a scaled result, is possible with addition or subtraction of tiny inputs (tiny=denormalized floating point nr.)

If not, then the Hercules modifications can be removed and the stock addMags routines can be used instead of those with hercules mods. And I suspect this is the case; I never changed the subMags routines, and they have similar code.

Best Regards, Steve Orso

srorso commented 6 years ago

Hi Fish,

Investigation complete. Results:

  1. PO says that scaled results are generated anytime a tiny result is created and underflow is trappable. So addition (and subtraction) need to populate the softfloat_raw. fields to enable later generation of a scaled result.

  2. s_roundPackToF32.c (and 64 and 128) each save the needed information for scaled result generation. There is no need for modifications in any addMags (or subMags) routine to do this.

So....

s_addMagsF32.c, s_addMagsF64.c, and s_addMagsF128.c may be reverted to their original unmodified state and the hercsource versions may be removed.

Best Regards, Steve Orso

Fish-Git commented 6 years ago

And f32_to_UI64.c and f128_to_UI64 can get the same treatment.

Huh? Are you sure?

Well, as sure as I am about many things. Certain, never.

That's good enough for me! Consider them history.

So....

s_addMagsF32.c, s_addMagsF64.c, and s_addMagsF128.c may be reverted to their original unmodified state and the hercsource versions may be removed.

Consider it done.

(Later...)

Commit 9ff609d contains the changes you requested: complete removal of f32_to_UI64.c and f128_to_UI64.c from 'hercsource' and corresponding adjustment to CMake sources.txt file to move them from the "UNUSED" group back into the "OTHER" group so Herc uses the native sf3e version of those functions instead of its own (just like I did previously per your guidance for f64_to_ui64.c), as well as the very same treatment for s_addMagsF32.c, s_addMagsF64.c and s_addMagsF128.c that you just now mentioned too.

The end result however, is not much better than before. :(

Some fixed, some not, some new:

Did 22 tests.  2 failed; 20 OK.

Test bfp-016-add          943 OK compares  74 failures (AEBR, AEB, ADBR, ADB, AXBR)
Test bfp-018-subtract     945 OK compares  72 failures (SEBR, SEB, SDBR, SDB, SXBR)

Test bfp-001-divtoint    1585 OK compares  All pass    (DIEBR, DIDBR)
Test bfp-002-loadr        395 OK compares  All pass    (LEDBR, LEDB, LEXBR, LEXB, LDXBR, LDXB)
Test bfp-003-loadfpi      362 OK compares  All pass    (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-004-cvttolog     211 OK compares  All pass    (CLFEBR, CLFDBR, CLFXBR)
Test bfp-005-cvttolog64   271 OK compares  All pass    (CLGEBR, CLGDBR, CLGXBR)
Test bfp-006-cvttofix     295 OK compares  All pass    (CFEBR, CFEBRA, CFDBR, CFDBRA, CFXBR, CFXBRA)
Test bfp-007-cvttofix64   387 OK compares  All pass    (CGEBR, CGBRA, CGDBR, CGDBRA, CGXBR, CGXBRA)
Test bfp-008-cvtfrlog      49 OK compares  All pass    (CELFBR, CDLFBR, CXLFBR)
Test bfp-009-cvtfrlog64    73 OK compares  All pass    (CELGBR, CDLGBR, CXLGBR)
Test bfp-010-cvtfrfix      56 OK compares  All pass    (CEFBR, CEFBRA, CDFBR, CDFBRA, CXFBR, CXFBRA)
Test bfp-011-cvtfrfix64   119 OK compares  All pass    (CEGBR, CEGBRA, CDGBR, CDGBRA, CXGBR, CXGBRA)
Test bfp-012-loadtest      79 OK compares  All pass    (LTEBR, LTDBR, LTXBR, TCEB, TCDB, TCXB)
Test bfp-013-comps        385 OK compares  All pass    (CEBR, CEB, CDBR, CDB, CXBR, KEBR, KEB, KDBR, KDB, KXBR)
Test bfp-014-divide       645 OK compares  All pass    (DEBR, DEB, DDBR, DDB, DXBR)
Test bfp-015-sqrt         131 OK compares  All pass    (SQEBR, SQEB, SQDBR, SQDB, SQXBR)
Test bfp-017-loadl        170 OK compares  All pass    (LDEBR, LDEB, LXEBR, LXEB, LXDBR, LXDB)
Test bfp-019-multiply     729 OK compares  All pass    (MEEBR, MEEB, MDBR, MDB, MXBR)
Test bfp-020-multlonger   513 OK compares  All pass    (MDEBR, MDEB, MXDBR, MXDB)
Test bfp-021-multadd     2708 OK compares  All pass    (MAEBR, MAEB, MADBR, MADB)
Test bfp-022-multsub     2708 OK compares  All pass    (MSEBR, MSEB, MSDBR, MSDB)

  Here are the runtest files and the (hopefully helpful!) "illustrated version" of the failures:

Any ideas?

srorso commented 6 years ago

Hi Fish,

The end result however, is not much better than before. :(

But it is different and very instructive/helpful. The penutlimate test log showed failures for 32- and 64-bit add and subtract for all cases where the binary exponent of the two values matched. This set fails for all cases of trappable underflow.

Any ideas?

Yup...the short answer is "I did something dumb." (if I had a nickel...)

The longer answer is "goto statements in c code create non-obvious execution paths." So does a deeply-nested if, but there one knows one is wrestling a snake. With gotos, the snake has bitten one on the butt before one is even aware of its existence. (Boy, do I hate gotos in c code.)

When both operands to addition are tiny, they both have less significand bits than the target precision, which means the result of any addition will have at most bits enough to fill the target precision and no more.

When the s_addMagsFxxx routines detect input values that are both tiny, they just add the significands and assemble the result into a floating point format. Rounding is not required because the result cannot have more bits than will fit in the target precision. This is achieved with a goto (line 32 in unmodified s_addMags32.c for example).

In all other cases, the s_addMagsFxxx routines call s_roundPackToFxxx, which does rounding and also saves the raw information needed to later generate a scaled result. This means that when both input values are tiny, no raw information is saved.

Which means the mods, perhaps modified, for the s_addMagsFxxx routines will need to go back in to hercsource. The routines f32_to_UI64.c and f128_to_UI64.c do not need to go back to hercsource.

I will do some testing to confirm the modifications. It is a certainty that the s_addMagsFxxx need to return to hercsource.

With apologies for the inconvenience and the double-effort.

Best Regards, Steve Orso

Fish-Git commented 6 years ago

This is achieved with a goto (line 32 in unmodified s_addMags32.c for example).

Line 32 is a comment line. :)

 

Which means the mods, perhaps modified, for the s_addMagsFxxx routines will need to go back in to hercsource.

I will do some testing to confirm the modifications. It is a certainty that the s_addMagsFxxx need to return to hercsource.

Thanks. Let me know if they need any tweaking.

In the mean time I'll put all the original Herc modified versions of those 3 files that I previously removed (s_addMagsFxxx) back in again and re-test.

Who knows, maybe we'll get lucky?

Fish-Git commented 6 years ago

Who knows, maybe we'll get lucky?

Wishful thinking. :(

It looks like we're back to where we were yesterday. Based on what you previously wrote I suspect some tweaking might be in order. Should I bother posting the results?

If you want me to commit the re-addition of those 3 files to 'hercsource' let me know. In the mean time I'll standby waiting for your analysis of what tweaks to the Herc code for those three files need to be made.

Thanks.

srorso commented 6 years ago

Hi Fish,

Go ahead and re-commit. I have tweaks to make, but I know what’s needed and where.

Best regards, Steve Orso

Fish-Git commented 6 years ago

Go ahead and re-commit.

Done!

I have tweaks to make, but I know what’s needed and where.

Cool.  Go ahead and commit your changes when they're ready and I'll re-test.

In the mean time I'm going to take a nap. Been up all night. My eyes are falling out. Zzzzzz....

srorso commented 6 years ago

Hi Fish,

I knew what I needed to do, but now I know I need to do more....

The s_addMagsFxxx functions have fast-path code to deal with addition of two subnormal numbers. The fast-path code skips rounding and the generalized normalization in favor of a conditional one-bit normalization. Rounding is not needed when adding subnormals. Detection of a subnormal result and saving information needed for a scaled result must be done therefore in the s_addMagsFxxx routines for these cases because s_addMagsFxxx, where detection and saving are normally done, is not called.

The sf3a versions of s_subMagsF32 and s_subMagsF64 did not have fast-paths. They do now and will have to be modified to generate PO-compliant results. Function s_subMagsF128 does not have any fast-path coding for subnormal inputs.

So s_subMagsF32 and s_subMagsF64 need to be added to the files in the hercsource directory. Function s_subMagsF128 does not need to be added.

Would you be willing to copy s_subMagsF32 and s_subMagsF64 to hercsource (unmodified) and change your build to use these versions in preference to those in source? I know I can figure it out if you have much on your plate and will if you wish.

Many thanks!

Best Regards, Steve Orso

srorso commented 6 years ago

Hi Fish,

I have fully commented s_addMagsF32.c (attached). I plan to include the comments in the committed version if you feel there is value to it. Let me know.

s_addMagsF32 .c.txt

Best Regards, Steve Orso

Fish-Git commented 6 years ago

I have fully commented s_addMagsF32.c (attached). I plan to include the comments in the committed version if you feel there is value to it. Let me know.

Comments definitely have value, Steve! :))

While I personally didn't care too much for how you chose to align(*) them, overall I felt they were quite clear and understandable and more importantly, helpful!

So yeah, they definitely have value. :)    (Thank you!)

Attached is my own version which contains minor tweaking to the alignment of your comments (and a fix to a typo or two) as well as a fix to what I believe is a bug:

141c141
<                 softfloat_raw.Sign = signZ;     /* Save result sign     */
---
>                 softfloat_raw.Sign = signF32UI( uiA );    /* Save result sign */

 

  Please review it and tell me what you think.

Thanks!

(p.s. is "iI" supposed to mean "IFF" ("If and ONLY if")?)


(*) I'm wondering if maybe you are using a different tab setting than the rest of us. I am (and I believe the rest of us are) using a tab setting of 4. I'm guessing you're probably using 8. That might explain why your comments appeared misaligned to me.

(**) I sure wish GitHub would allow attaching source code file types! (.c, .h, etc) Having to change them to .txt files before uploading them is quite annoying!  >:-(

Fish-Git commented 6 years ago

@srorso,

Would you be willing to copy s_subMagsF32 and s_subMagsF64 to hercsource (unmodified) and change your build to use these versions in preference to those in source? I know I can figure it out if you have much on your plate and will if you wish.

Not a problem. Give me a few minutes.

What about the s_addMagsFxxx files though? Same treatment? Add just the 32 and 64 but not the 128?

Fish-Git commented 6 years ago

What about the s_addMagsFxxx files though?

(Oops!) They're already in hercsource! I just didn't notice because I forgot to add them to the Visual Studio project filters (cosmetic thing; no big deal).

I'll get that fixed right away and add s_subMagsF32 and s_subMagsF64 too. Give me a few minutes.

Fish-Git commented 6 years ago

@srorso,

Done!     (commits 5f2a0f9 and 3e904f5)

I'll keep an eye out for your eventual commits with the herc modified versions, and then re-test and post the results once I see them.

Thanks again for your help, Steve. Much appreciated.

srorso commented 6 years ago

a typo or two) as well as a fix to what I believe is a bug:

141c141
<                 softfloat_raw.Sign = signZ;     /* Save result sign     */
---
                softfloat_raw.Sign = signF32UI( uiA );    /* Save result sign */

Yup, a bug. Good catch. I am surprised this did not show up in the test cases. This issue likely affects 64 and 128 too. I will review.

(*) I'm wondering if maybe you are using a different tab setting than the rest of us.

I am confused about this. I edit source for this in VS2008 Express edition SP1, tabs set at 4, tabs converted to spaces. When I open your attachment and mine in Notepad++ side by side, I see the same indentation. I guess the real test is to look at what gets committed versus tabs every 4 and convert to spaces upon entry.

I am about to commit addMags and subMags for a test run. I expect to be close. On the odd chance that they test clean, there are still some things to be done, but I'll address that when the tests run clean.

I sure wish GitHub would allow attaching source code file types! (.c, .h, etc)

I agree...but....

Given that Github needs to be language-agnostic, I am not sure how one would do this....which extensions are allowed, and at some point one just gives up limiting extensions to those acceptable. c, c++, python, mainframe assembler, FORTRAN, COBOL, REXX. It would become a repository or project-specific Github setting. Smarter people that me can work on that. :-)

Best Regards, Steve Orso

Fish-Git commented 6 years ago

(*) I'm wondering if maybe you are using a different tab setting than the rest of us.

I am confused about this. I edit source for this in VS2008 Express edition SP1, tabs set at 4, tabs converted to spaces. When I open your attachment and mine in Notepad++ side by side, I see the same indentation. I guess the real test is to look at what gets committed versus tabs every 4 and convert to spaces upon entry.

Well then maybe it's just your unusual comment style then:

#if defined( IBM_IEEE )
        /* If exp zero and sig non-zero, then still tiny.  If addition  */
        /* ...overflowed into exponent, which is OK, then no longer a   */
        /* ...tiny.                                                     */
                /* Is result still a tiny?                              */
        if ( !(expZ) && sigDiff )
        {       /* ..yes, save value for scaled result, set flags,      */
                /* ...indicate subnormal result                         */
            softfloat_exceptionFlags |= softfloat_flag_tiny;
            softfloat_raw.Incre = false;    /* Not incremented          */
            softfloat_raw.Inexact = false;  /* Not inexact              */
            softfloat_raw.Tiny = true;      /* Is tiny (subnormal)      */
            softfloat_raw.Sign = signZ;     /* Save result sign         */
            softfloat_raw.Exp = -1022;      /* Indicate subnormal in exp */
                                            /* save significand         */
            softfloat_raw.Sig64 = ((uint_fast64_t) (sigDiff << shiftDist)) << 10;
            softfloat_raw.Sig0 = 0;         /* Zero bits 64-128         */
        }
#endif

When I see the "Is result still a tiny?", "..yes, save value" and " ...indicate subnormal" comments indented the way they are, it seems (to me!) like something isn't right.

I would expect the " Is result still a tiny?" comment to be on the same line as the if statement itself, and the "..yes" and "...indicate" comments to be left-aligned with the code.

But then maybe that's just me. I'm probably being too picky. Don't worry about it. It's your hard work. However you code it is fine with me! I just honestly thought maybe there was a wrong tab setting going on, that's all (since they seemed to be indented an extra 8 spaces).

Fish-Git commented 6 years ago

I am about to commit addMags and subMags for a test run. I expect to be close.

Oh, so close!

Only a total of 6 failures remaining: 3 for add, and 3 for subtract. All other tests passed 100%.

>>>>> line 20993: Storage compare mismatch.
                  Want: R:0000000000001210 80000000 80000000 80000000 80000000  "AEBR/AEB NF -0/-0"
                  Got:  R:0000000000001210 80000000 D4000000 80000000 D4000000
>>>>> line 21393: Storage compare mismatch.
                  Want: R:0000000000001910 00000000 F8000000 00000000 F8000000  "AEBR/AEB NF -0/-0 FPCR"
                  Got:  R:0000000000001910 00000000 F8001001 00000000 F8001001
>>>>> line 21665: Storage compare mismatch.
                  Want: R:0000000000001E10 007FFFFF 607FFFFE 007FFFFF 607FFFFE  "AEBR/AEB F Ufl 1"
                  Got:  R:0000000000001E10 007FFFFF 607FFFFF 007FFFFF 607FFFFF
Test "bfp-016-add.tst:        AEBR, AEB, ADBR, ADB, AXBR":  1014 OK compares.  3 failures.
>>>>> line 25810: Storage compare mismatch.
                  Want: R:0000000000001220 80000000 80000000 80000000 80000000  "SEBR/SEB NF -0/+0"
                  Got:  R:0000000000001220 80000000 D4000000 80000000 D4000000
>>>>> line 26211: Storage compare mismatch.
                  Want: R:0000000000001920 00000000 F8000000 00000000 F8000000  "SEBR/SEB NF -0/+0 FPCR"
                  Got:  R:0000000000001920 00000000 F8001001 00000000 F8001001
>>>>> line 26480: Storage compare mismatch.
                  Want: R:0000000000001E10 007FFFFF 607FFFFE 007FFFFF 607FFFFE  "SEBR/SEB F Ufl 1"
                  Got:  R:0000000000001E10 007FFFFF 607FFFFF 007FFFFF 607FFFFF
Test "bfp-018-subtract.tst:   SEBR, SEB, SDBR, SDB, SXBR":  1014 OK compares.  3 failures.
srorso commented 6 years ago

Hi Fish,

Commit 00a2b168 corrects the mis-identification of a 32-bit -0 as a subnormal floating point value. This should take care of two each of the AEBx and SEBx issues, leaving the Underflow 1 test for further consideration.

Which might be a challenge: I am unable to duplicate the result with or without the -0 change. All I know for the moment is that the difficulty is in s_subMags32, the routine used for addition of values with differing signs or subtraction of values with the same sign.

I changed the comment style slightly in s_addMagsF32, but that is for later, as is incorporation of your edits. If the code's wrong, the comments at best don't matter and at worst may distract from what the code actually does (as opposed to what the comments say it does).

Best Regards, Steve Orso

srorso commented 6 years ago

Hi Fish,

Should the recent commit not address the UFL 1 test issues, would you run the attached program and post results.

aebr.c.txt

You should get the following:

C:\common\github\sf3e>cl aebr.c lib/sf3eb64.lib /O3 /link /NODEFAULTLIB:MSVCRT
[....]
C:\common\github\sf3e>aebr
Inputs 0x80000000, 0x80000000, output 0x80000000
Raw: +, exp 0000 (0), sig 0000000000000000 0000000000000000, roundbits 0000, incre 0000
Softfloat exception flags 00 - no flags set

Inputs 0x00ffffff, 0x80800000, output 0x007fffff
Raw: +, exp 0xffffff82 (-126), sig 0x3fffff8000000000 0000000000000000, roundbits 0000, incre 0000
Softfloat exception flags 0x40 tiny
Scaled output 0x607fffff

Inputs 0x00800000, 0x80400000, output 0x00400000
Raw: +, exp 0xffffff81 (-127), sig 0x4000000000000000 0000000000000000, roundbits 0000, incre 0000
Softfloat exception flags 0x40 tiny
Scaled output 0x60000000

C:\common\github\sf3e>

The middle test corresponds to either UFL 1 test that is returning incorrect results. I'm using MSVC on the command line from VS 2017 CE.

Many thanks!

Best Regards, Steve Orso

srorso commented 6 years ago

Unable to reproduce the UFL 1 issue on Win 10/MSVC from VS2017 CE, Win 7 VS2008EE, CentOS 7 (gcc 4.8.5), Fedora 28 (gcc 8.?.?). Ugh....

Fish-Git commented 6 years ago

Latest results:

>>>>> line 21665: Storage compare mismatch.
                  Want: R:0000000000001E10 007FFFFF 607FFFFE 007FFFFF 607FFFFE  "AEBR/AEB F Ufl 1"
                  Got:  R:0000000000001E10 007FFFFF 607FFFFF 007FFFFF 607FFFFF
Test "bfp-016-add.tst:        AEBR, AEB, ADBR, ADB, AXBR":  1016 OK compares.  One failure.
>>>>> line 26480: Storage compare mismatch.
                  Want: R:0000000000001E10 007FFFFF 607FFFFE 007FFFFF 607FFFFE  "SEBR/SEB F Ufl 1"
                  Got:  R:0000000000001E10 007FFFFF 607FFFFF 007FFFFF 607FFFFF
Test "bfp-018-subtract.tst:   SEBR, SEB, SDBR, SDB, SXBR":  1016 OK compares.  One failure.

  As for the aebr.c test, I had to first make some changes to get it to compile cleanly on my system:

#if 0
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include "softfloat.h"
#else

  #ifndef   _WIN32_WINNT                /* see cmake's targetver.txt */
    #define _WIN32_WINNT  0x0601        /* Windows 7 Service Pack 1  */
    #define WINVER        _WIN32_WINNT  /* Windows 7 Service Pack 1  */
    #define _WIN32_IE     0x0800        /* Internet Explorer 8.0     */
  #endif

  /* Need <SDKDDKVer.h> header with later versions of Visual Studio. */

  #if _MSC_VER >= 1600                  /* If VS2010 or greater,     */
    #include <SDKDDKVer.h>              /* then need this header     */
  #endif

  #include <windows.h>                  /* Primary Windows header    */

  /* minimum stdint.h typedefs needed by package */ 
  typedef char                          int8_t; 
  typedef short                         int16_t; 
  typedef int                           int32_t; 
  typedef long long                     int64_t; 

  typedef unsigned char                 uint8_t; 
  typedef unsigned short                uint16_t; 
  typedef unsigned int                  uint32_t; 
  typedef unsigned long long            uint64_t; 

  typedef char                          int_fast8_t;
  typedef int                           int_fast16_t;
  typedef int                           int_fast32_t;
  typedef long long                     int_fast64_t;

  typedef unsigned char                 uint_least8_t;
  typedef unsigned char                 uint_fast8_t;
  typedef unsigned int                  uint_fast16_t;
  typedef unsigned int                  uint_fast32_t;
  typedef unsigned long long            uint_fast64_t;

  #define INT32_MAX                     0x7fffffff
  #define UINT32_MAX                    0xffffffff
  #define INT64_MAX                     0x7fffffffffffffff
  #define UINT64_MAX                    0xffffffffffffffffU

  #define INT32_C(x)                    ((x) + (INT32_MAX - INT32_MAX))
  #define INT64_C(x)                    ((x) + (INT64_MAX - INT64_MAX))
  #define UINT32_C(x)                   ((x) + (UINT32_MAX - UINT32_MAX))
  #define UINT64_C(x)                   ((x) + (UINT64_MAX - UINT64_MAX))

  /* Indicate #include <stdint.h> is now NOT needed by package */
  #define int32_t                       int32_t

  /* minimum stdbool.h #defines needed by package */ 
  #define false                         0
  #define true                          1
  #define __bool_true_false_are_defined 1
  #define _Bool                         int
  #define bool                          _Bool

  #include "softfloat.h"

#endif

  (Basically all I did was inline the code from the platform.h.in file)

With the above mods in place, my results are:

C:\Users\Fish\Downloads> vstools 64

Setting environment for using Microsoft Visual Studio 2008 Beta2 x64 tools.

C:\Users\Fish\Downloads> cl aebr-fish.c -D _Debug -D Debug -I "C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_Git\_Fish\sf3e-test-0\source\include" "C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_Git\_Fish\extpkgs-VS2008\lib\SoftFloat64d.lib" -Ox /link /NODEFAULTLIB:MSVCRTD

Microsoft (R) C/C++ Optimizing Compiler Version 15.00.30729.01 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

aebr-fish.c
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:aebr-fish.exe
/NODEFAULTLIB:MSVCRTD
aebr-fish.obj
"C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_Git\_Fish\extpkgs-VS2008\lib\SoftFloat64d.lib"

C:\Users\Fish\Downloads> aebr-fish.exe

Inputs 0x80000000, 0x80000000, output 0x80000000
Raw: +, exp 0000 (0), sig 0000000000000000 0000000000000000, roundbits 0000, incre 0000
Softfloat exception flags 00 - no flags set

Inputs 0x00ffffff, 0x80800000, output 0x007fffff
Raw: +, exp 0xffffff82 (-126), sig 0x3fffff8000000000 0000000000000000, roundbits 0000, incre 0000
Softfloat exception flags 0x40 tiny
Scaled output 0x607fffff

Inputs 0x00800000, 0x80400000, output 0x00400000
Raw: +, exp 0xffffff81 (-127), sig 0x4000000000000000 0000000000000000, roundbits 0000, incre 0000
Softfloat exception flags 0x40 tiny
Scaled output 0x60000000

C:\Users\Fish\Downloads>

  which looks identical to yours. <shrug>

Are you sure all the "Want:" values in your two tests are correct?

I only ask because I notice the last value in each -- 607FFFFE -- doesn't have the low-order bit on (notice it's "eee", not "eff"), whereas all of the others have the low-order bit on, and our stand-alone aebr.c test is returning the exact same results too: "eff" in the low-order bit, not "eee".

That's why I ask: are you sure your bfp-016-add.tst and bfp-018-subtract.tst tests' "Want:" values are correct for these two tests?

I mean, Dr. Hauser did mention after all:

For 64-bit double-precision, the flaw in the square root function was of very minor impact, causing a 1-ulp error (1 unit in the last place) a few times out of a billion.

Now I realize this isn't the square-root logic we're dealing with here (and we're certainly not executing things billions of times!), but ... his very mention of a "1-ulp error (1 unit in the last place)" makes me wonder if maybe, just maybe, that might be something similar to what we're currently tripping over?

Fish-Git commented 6 years ago

... his very mention of a "1-ulp error (1 unit in the last place)" makes me wonder if maybe, just maybe, that might be something similar to what we're currently tripping over?

Or put another way: maybe there was a 1-ulp error in the previous 3a version of softfloat, and since your tests were all written during that time, maybe you designed your tests to expect 607FFFFE because that's the value that 3a was returning for those tests?

In other words, do you think maybe 3a had a bug in it that was causing incorrect results to be returned for these two tests, and so that's the way you coded your tests? To expect "eee" in the last bit? Whereas in 3e, the bug has now been fixed and is now returning the correct results? ("eff", not "eee")

I'm probably way off base, but I just thought I'd mention it just in case.

srorso commented 6 years ago

Hi Fish,

Got it. s_subMagsF32 did not shift the subnormal result quite enough when populating the structure that provides source data for returning a scaled result. One more bit left was needed to move the MSB into the position that would become the low-order bit of the exponent.

This should address the last two errors from runtest; please let me know.

I still need to do a code review and a comment review. There is also some question about validation on macOS. I found a very interesting problem in BFP results when I ran on a mac. Let me know if you have access to a mac for validation.

Best Regards, Steve Orso