New bc_num_sqrt function needs only one long division.

naggamura commented 9 months ago

I'm sorry I missed many things on last PR. I followed your instructions step by step.

Improved SQRT needs no long division in iteration. Only one long division is needed for compose an initial estimation.

No meaningful difference for small input.

For large input, about 2.4 times faster, no meaningful difference in memory usage.

Here is the result for large input.

( 'CC=clang CFLAGS=-flto ./configure -O3' is used for configuring. ) stat_c1.txt stat_c4.txt

sqrt1_large.txt : by your code. sqrt2_large.txt : by my code.

Elapsed time for large input.

x ../../sqrt1_large.txt

../../sqrt2_large.txt +--------------------------------------------------------------------------------+ | + | | + | | + | | + | | + | | + | | + | | + | | + | | + | | + | | + x | | + x | | + x | | + x | | + x | | + x | | + x | | + x | | + x | | + x | | + x | | + x x| | ++ x xx| |+++ x xx| |+++ x xx| |+++ xxxx| |+++ xxxx| |+++ x xxxx| |+++ x xxxx| |+++ x xxxx| |++++ x xxxx| | A| |_A| | +--------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 50 87.84 91.81 90.215 90.3572 1.0769137
50 36.96 38.71 37.805 37.8088 0.44755408 Difference at 99.5% confidence -52.5484 +/- 0.523644 -58.1563% +/- 0.315605% (Student's t, pooled s = 0.824636)

Memory for large input.

x ../../sqrt1_large.txt

../../sqrt2_large.txt +--------------------------------------------------------------------------------+ | + | | + x | | + x | | + + x x x | | + + + x x x | | + + ++ ++ xx x x x x x | |+ ++++ +++++ + +++ x xx x x x xx xxx xx| |+ + +++++++++++++++++ + x xxxxxxxxx xxxx xxxx xx| | |AM| |__M_A____| | +--------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 50 5376 5596 5494 5508.96 51.80535
50 5184 5412 5302 5301.12 44.115174 Difference at 99.5% confidence -207.84 +/- 30.5525 -3.77276% +/- 0.542566% (Student's t, pooled s = 48.1141)
END -

gavinhoward commented 9 months ago

Unfortunately, you still did not check everything. That's probably my fault because I didn't tell you everything.

Nevertheless, this is what I found this time:

CC=clang ./configure -pDBG; make, my standard debug build, failed to build.
CC=clang ./configure -gO0; make; make test, a less strict debug build, failed the test suite.
CC=clang ./configure -pGDH; make, my standard release build for myself, failed to build.
CC=clang ./configure -pGNU; make; make test, the standard release build for Linux, failed the test suite.
CC=gcc ./configure -pGNU; make; make test, the same standard Linux build but with gcc, also failed the test suite.
CC=clang ./configure -pBSD; make; make test, the standard release build for FreeBSD and friends, failed the test suite.
CC=gcc ./configure -pBSD; make; make test, the same standard FreeBSD build but with gcc, also failed the test suite.

In addition, what failed on the test suite was off by a lot; the result I got for all cases was:

Running bc sqrt...FAIL!!!
12c12
< .0000000000000035071355833500363
---
> .8915580480000035071355833500363
bc failed test sqrt

As you can see, the numbers are the same, except at the beginning, which means that answer was about as far off as it could be.

In addition, when I ran this:

$ ./configure -gO0 -v
$ make
$ make test

which is a Valgrind-enabled build, I got this:

Running bc sqrt...==47836== Memcheck, a memory error detector
==47836== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==47836== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==47836== Command: bin/bc -lqc ./tests/bc/sqrt.txt
==47836==
==47836== Invalid write of size 8
==47836==    at 0x484C96E: memset (vg_replace_strmem.c:1386)
==47836==    by 0x12A28F: bc_int_elementary_mul (num.c:4274)
==47836==    by 0x1253AA: bc_int_k_mul (num.c:4327)
==47836==    by 0x124972: bc_num_sqrt (num.c:4636)
==47836==    by 0x13156F: bc_program_builtin (program.c:1990)
==47836==    by 0x12DB0C: bc_program_exec (program.c:3332)
==47836==    by 0x13D623: bc_vm_process (vm.c:1046)
==47836==    by 0x13CCA5: bc_vm_file (vm.c:1103)
==47836==    by 0x13C117: bc_vm_exec (vm.c:1498)
==47836==    by 0x13B4E1: bc_vm_boot (vm.c:1718)
==47836==    by 0x10C0CC: bc_main (bc.c:62)
==47836==    by 0x119995: main (main.c:108)
==47836==  Address 0x4aa2270 is 32 bytes inside a block of size 36 alloc'd
==47836==    at 0x4840784: malloc (vg_replace_malloc.c:442)
==47836==    by 0x13A4C3: bc_vm_malloc (vm.c:810)
==47836==    by 0x11BA76: bc_num_init (num.c:3368)
==47836==    by 0x1244B1: bc_num_sqrt (num.c:4567)
==47836==    by 0x13156F: bc_program_builtin (program.c:1990)
==47836==    by 0x12DB0C: bc_program_exec (program.c:3332)
==47836==    by 0x13D623: bc_vm_process (vm.c:1046)
==47836==    by 0x13CCA5: bc_vm_file (vm.c:1103)
==47836==    by 0x13C117: bc_vm_exec (vm.c:1498)
==47836==    by 0x13B4E1: bc_vm_boot (vm.c:1718)
==47836==    by 0x10C0CC: bc_main (bc.c:62)
==47836==    by 0x119995: main (main.c:108)
==47836==
==47836==
==47836== HEAP SUMMARY:
==47836==     in use at exit: 0 bytes in 0 blocks
==47836==   total heap usage: 1,633 allocs, 1,633 frees, 331,419 bytes allocated
==47836==
==47836== All heap blocks were freed -- no leaks are possible
==47836==
==47836== For lists of detected and suppressed errors, rerun with: -s
==47836== ERROR SUMMARY: 8 errors from 1 contexts (suppressed: 0 from 0)
FAIL!!!
bc failed test 'sqrt' with error code 100

That needs to be fixed, both the out-of-bounds write and the unaligned write.

If you're wondering how much testing I do, do this:

Create two clones of your branch in separate directories, which I will call bc1/ and bc2/.
In bc1/, run scripts/release.sh 1 1 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1.
In bc2/, run scripts/release.sh 1 1 1 0 1 0 1 0 1 0 1 0 0 1 1 1 1.
Run them both in separate terminals and at the same time.
Leave it overnight.

Yep, you heard that last bit right: my full test run is an overnight thing. And I run both because one does clang, and one does gcc.

Moving on to benchmarks, I ran

$ CC=clang CFLAGS=-flto ./configure -O3
$ make

in both branches. Then I ran

$ ./scripts/benchmark.sh -p1 -n50 bc newton_raphson_sqrt_small > ../sqrt2_small.txt

in your branch, copied benchmarks/bc/newton_raphson_sqrt_small.txt to my branch and ran

$ ./scripts/benchmark.sh -p1 -n50 bc newton_raphson_sqrt_small > ../sqrt1_small.txt

These are my results:

$ ./scripts/ministat -w 80 -c99.5 -C1 ../sqrt1_small.txt ../sqrt2_small.txt
x ../sqrt1_small.txt
+ ../sqrt2_small.txt
+--------------------------------------------------------------------------------+
|                       +                           x                            |
|                       +                           x    x                       |
|                       +                           x    x                       |
|                 +     +    +                      x    x                       |
|                 +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +     +          x     x    x     x     x    x      |
|      +    +     +     +    +     +     x    x     x    x     x     x    x      |
|      +    +     +     +    +     +     x    x     x    x     x     x    x      |
|+     +    +     +     +    +     +     *    x     x    x     x     x    x     x|
|            |________A_M_____|                 |________MA________|             |
+--------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  50          1.45          1.52          1.48        1.4806   0.016463937
+  50          1.38          1.45          1.42        1.4164   0.015220824
Difference at 99.5% confidence
        -0.0642 +/- 0.0100677
        -4.33608% +/- 0.664237%
        (Student's t, pooled s = 0.0158546)
$ ./scripts/ministat -w 80 -c99.5 -C4 ../sqrt1_small.txt ../sqrt2_small.txt
x ../sqrt1_small.txt
+ ../sqrt2_small.txt
+--------------------------------------------------------------------------------+
|                                                              +                 |
|                                                              +                 |
|                                        x                     +                 |
|                          x             x                     +                 |
|                          x             x         +           +                 |
|                          x             x      +x +     +     +                 |
|                          x         +  xx      +x +  x  +     +                 |
|                          x         *+ xx      *x++  xx +  +  +                 |
|                          xx       +*+xxxx    x**++  xx +  +  +                 |
|x                      x  xx       +*+xxxx   x***++xx*x+++++  +               ++|
|                             |__________A_|_______M|A_________|                 |
+--------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  50         18956         19244         19168      19168.16     57.880722
+  50         19144         19376         19224      19232.88      54.73373
Difference at 99.5% confidence
        64.72 +/- 35.769
        0.337643% +/- 0.186939%
        (Student's t, pooled s = 56.3292)

Yours was slightly faster and used slightly more memory.

I ran the same benchmark for large, but only for one sample each, and my numbers say that yours are accurate for that benchmark.

However, I cannot accept code that fails tests or has memory bugs. In addition, because of the test failures, you can expect that I will run a lot more tests on your code before I accept.

So I won't close this PR, but you need to fix the problems above (by running that release.sh script both ways and fixing every problem) before I'll look at it again.

Also, I can't understand sqrt_2.png. I can follow sqrt_1.png, but I need you to redo sqrt_2.png because I also won't accept the code if I don't understand it.

naggamura commented 9 months ago

BcNum.len is managed as tight as possible. So clearing upper digits is needed for like sqrt(0.000...000xyz...)
Passed 'make test'

naggamura commented 9 months ago

Last push was not the latest version. Pushed again.

Anyway, test again and passed all tests.

... ==5965== ==5965== HEAP SUMMARY: ==5965== in use at exit: 0 bytes in 0 blocks ==5965== total heap usage: 1,520 allocs, 1,520 frees, 205,685 bytes allocated ==5965== ==5965== All heap blocks were freed -- no leaks are possible ==5965== ==5965== For counts of detected and suppressed errors, rerun with: -v ==5965== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) pass Running bc directory test...pass Running bc binary file test...pass Running bc binary stdin test...pass Running bc limits tests...pass

All bc tests passed.

... ...

==6927== ==6927== HEAP SUMMARY: ==6927== in use at exit: 0 bytes in 0 blocks ==6927== total heap usage: 73 allocs, 73 frees, 37,886 bytes allocated ==6927== ==6927== All heap blocks were freed -- no leaks are possible ==6927== ==6927== For counts of detected and suppressed errors, rerun with: -v ==6927== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) pass Running dc directory test...pass Running dc binary file test...pass Running dc binary stdin test...pass

All dc tests passed.

naggamura commented 9 months ago

Doing scripts/release.sh test...

naggamura commented 9 months ago

Passed the tests.

Benchmarks

stat_small_c1.txt stat_small_c4.txt stat_large_c1.txt stat_large_c4.txt

naggamura commented 9 months ago

sqrt_2.png updated.

gavinhoward commented 9 months ago

I have spent two full days on this since the last time I sent a message. I cannot spend any more time on this; I have a project to finish and a hard deadline.

So I have to make a final decision, which is to reject this. Thank you for your contribution, but I guess I don't want it.

There are several reasons why:

Your code still contains debug code.
You created your own functions to do things when functions to do those things already exist, which shows lack of research again. And duplication.
Your proof looks wrong to me, or I still can't understand it.
Your code passes the test suite, but it did not pass another set of tests I threw at it.
This is purely a performance PR, which is not really important to me right now.

But most of all, and this is nothing against you, I just can't read other people's code. I am unable to build a theory of mind, and this extends to reading code written by other people.

There is a reason I tend to do things by myself; I can't work on the code otherwise.

So I'm afraid I have run out of time and desire. Sorry.

gavinhoward / bc