gavinhoward / bc

An implementation of the POSIX bc calculator with GNU extensions and dc, moved away from GitHub. Finished, but well-maintained.
https://git.gavinhoward.com/gavin/bc
Other
162 stars 32 forks source link

New bc_num_sqrt function needs only one long division. #73

Closed naggamura closed 9 months ago

naggamura commented 9 months ago

I'm sorry I missed many things on last PR. I followed your instructions step by step.

Improved SQRT needs no long division in iteration. Only one long division is needed for compose an initial estimation.

No meaningful difference for small input.

For large input, about 2.4 times faster, no meaningful difference in memory usage.

Here is the result for large input.

( 'CC=clang CFLAGS=-flto ./configure -O3' is used for configuring. ) stat_c1.txt stat_c4.txt

sqrt1_large.txt : by your code. sqrt2_large.txt : by my code.


Elapsed time for large input.

x ../../sqrt1_large.txt


Memory for large input.

x ../../sqrt1_large.txt

gavinhoward commented 9 months ago

Unfortunately, you still did not check everything. That's probably my fault because I didn't tell you everything.

Nevertheless, this is what I found this time:

In addition, what failed on the test suite was off by a lot; the result I got for all cases was:

Running bc sqrt...FAIL!!!
12c12
< .0000000000000035071355833500363
---
> .8915580480000035071355833500363
bc failed test sqrt

As you can see, the numbers are the same, except at the beginning, which means that answer was about as far off as it could be.

In addition, when I ran this:

$ ./configure -gO0 -v
$ make
$ make test

which is a Valgrind-enabled build, I got this:

Running bc sqrt...==47836== Memcheck, a memory error detector
==47836== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==47836== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==47836== Command: bin/bc -lqc ./tests/bc/sqrt.txt
==47836==
==47836== Invalid write of size 8
==47836==    at 0x484C96E: memset (vg_replace_strmem.c:1386)
==47836==    by 0x12A28F: bc_int_elementary_mul (num.c:4274)
==47836==    by 0x1253AA: bc_int_k_mul (num.c:4327)
==47836==    by 0x124972: bc_num_sqrt (num.c:4636)
==47836==    by 0x13156F: bc_program_builtin (program.c:1990)
==47836==    by 0x12DB0C: bc_program_exec (program.c:3332)
==47836==    by 0x13D623: bc_vm_process (vm.c:1046)
==47836==    by 0x13CCA5: bc_vm_file (vm.c:1103)
==47836==    by 0x13C117: bc_vm_exec (vm.c:1498)
==47836==    by 0x13B4E1: bc_vm_boot (vm.c:1718)
==47836==    by 0x10C0CC: bc_main (bc.c:62)
==47836==    by 0x119995: main (main.c:108)
==47836==  Address 0x4aa2270 is 32 bytes inside a block of size 36 alloc'd
==47836==    at 0x4840784: malloc (vg_replace_malloc.c:442)
==47836==    by 0x13A4C3: bc_vm_malloc (vm.c:810)
==47836==    by 0x11BA76: bc_num_init (num.c:3368)
==47836==    by 0x1244B1: bc_num_sqrt (num.c:4567)
==47836==    by 0x13156F: bc_program_builtin (program.c:1990)
==47836==    by 0x12DB0C: bc_program_exec (program.c:3332)
==47836==    by 0x13D623: bc_vm_process (vm.c:1046)
==47836==    by 0x13CCA5: bc_vm_file (vm.c:1103)
==47836==    by 0x13C117: bc_vm_exec (vm.c:1498)
==47836==    by 0x13B4E1: bc_vm_boot (vm.c:1718)
==47836==    by 0x10C0CC: bc_main (bc.c:62)
==47836==    by 0x119995: main (main.c:108)
==47836==
==47836==
==47836== HEAP SUMMARY:
==47836==     in use at exit: 0 bytes in 0 blocks
==47836==   total heap usage: 1,633 allocs, 1,633 frees, 331,419 bytes allocated
==47836==
==47836== All heap blocks were freed -- no leaks are possible
==47836==
==47836== For lists of detected and suppressed errors, rerun with: -s
==47836== ERROR SUMMARY: 8 errors from 1 contexts (suppressed: 0 from 0)
FAIL!!!
bc failed test 'sqrt' with error code 100

That needs to be fixed, both the out-of-bounds write and the unaligned write.

If you're wondering how much testing I do, do this:

Yep, you heard that last bit right: my full test run is an overnight thing. And I run both because one does clang, and one does gcc.

Moving on to benchmarks, I ran

$ CC=clang CFLAGS=-flto ./configure -O3
$ make

in both branches. Then I ran

$ ./scripts/benchmark.sh -p1 -n50 bc newton_raphson_sqrt_small > ../sqrt2_small.txt

in your branch, copied benchmarks/bc/newton_raphson_sqrt_small.txt to my branch and ran

$ ./scripts/benchmark.sh -p1 -n50 bc newton_raphson_sqrt_small > ../sqrt1_small.txt

These are my results:

$ ./scripts/ministat -w 80 -c99.5 -C1 ../sqrt1_small.txt ../sqrt2_small.txt
x ../sqrt1_small.txt
+ ../sqrt2_small.txt
+--------------------------------------------------------------------------------+
|                       +                           x                            |
|                       +                           x    x                       |
|                       +                           x    x                       |
|                 +     +    +                      x    x                       |
|                 +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +                      x    x     x                 |
|           +     +     +    +     +          x     x    x     x     x    x      |
|      +    +     +     +    +     +     x    x     x    x     x     x    x      |
|      +    +     +     +    +     +     x    x     x    x     x     x    x      |
|+     +    +     +     +    +     +     *    x     x    x     x     x    x     x|
|            |________A_M_____|                 |________MA________|             |
+--------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  50          1.45          1.52          1.48        1.4806   0.016463937
+  50          1.38          1.45          1.42        1.4164   0.015220824
Difference at 99.5% confidence
        -0.0642 +/- 0.0100677
        -4.33608% +/- 0.664237%
        (Student's t, pooled s = 0.0158546)
$ ./scripts/ministat -w 80 -c99.5 -C4 ../sqrt1_small.txt ../sqrt2_small.txt
x ../sqrt1_small.txt
+ ../sqrt2_small.txt
+--------------------------------------------------------------------------------+
|                                                              +                 |
|                                                              +                 |
|                                        x                     +                 |
|                          x             x                     +                 |
|                          x             x         +           +                 |
|                          x             x      +x +     +     +                 |
|                          x         +  xx      +x +  x  +     +                 |
|                          x         *+ xx      *x++  xx +  +  +                 |
|                          xx       +*+xxxx    x**++  xx +  +  +                 |
|x                      x  xx       +*+xxxx   x***++xx*x+++++  +               ++|
|                             |__________A_|_______M|A_________|                 |
+--------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  50         18956         19244         19168      19168.16     57.880722
+  50         19144         19376         19224      19232.88      54.73373
Difference at 99.5% confidence
        64.72 +/- 35.769
        0.337643% +/- 0.186939%
        (Student's t, pooled s = 56.3292)

Yours was slightly faster and used slightly more memory.

I ran the same benchmark for large, but only for one sample each, and my numbers say that yours are accurate for that benchmark.

However, I cannot accept code that fails tests or has memory bugs. In addition, because of the test failures, you can expect that I will run a lot more tests on your code before I accept.

So I won't close this PR, but you need to fix the problems above (by running that release.sh script both ways and fixing every problem) before I'll look at it again.

Also, I can't understand sqrt_2.png. I can follow sqrt_1.png, but I need you to redo sqrt_2.png because I also won't accept the code if I don't understand it.

naggamura commented 9 months ago

BcNum.len is managed as tight as possible. So clearing upper digits is needed for like sqrt(0.000...000xyz...)
Passed 'make test'

naggamura commented 9 months ago

Last push was not the latest version. Pushed again.

Anyway, test again and passed all tests.

... ==5965== ==5965== HEAP SUMMARY: ==5965== in use at exit: 0 bytes in 0 blocks ==5965== total heap usage: 1,520 allocs, 1,520 frees, 205,685 bytes allocated ==5965== ==5965== All heap blocks were freed -- no leaks are possible ==5965== ==5965== For counts of detected and suppressed errors, rerun with: -v ==5965== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) pass Running bc directory test...pass Running bc binary file test...pass Running bc binary stdin test...pass Running bc limits tests...pass

All bc tests passed.

... ...

==6927== ==6927== HEAP SUMMARY: ==6927== in use at exit: 0 bytes in 0 blocks ==6927== total heap usage: 73 allocs, 73 frees, 37,886 bytes allocated ==6927== ==6927== All heap blocks were freed -- no leaks are possible ==6927== ==6927== For counts of detected and suppressed errors, rerun with: -v ==6927== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) pass Running dc directory test...pass Running dc binary file test...pass Running dc binary stdin test...pass

All dc tests passed.



naggamura commented 9 months ago

Doing scripts/release.sh test...

naggamura commented 9 months ago

Passed the tests.

Benchmarks

stat_small_c1.txt stat_small_c4.txt stat_large_c1.txt stat_large_c4.txt

naggamura commented 9 months ago

sqrt_2.png updated.

gavinhoward commented 9 months ago

I have spent two full days on this since the last time I sent a message. I cannot spend any more time on this; I have a project to finish and a hard deadline.

So I have to make a final decision, which is to reject this. Thank you for your contribution, but I guess I don't want it.

There are several reasons why:

But most of all, and this is nothing against you, I just can't read other people's code. I am unable to build a theory of mind, and this extends to reading code written by other people.

There is a reason I tend to do things by myself; I can't work on the code otherwise.

So I'm afraid I have run out of time and desire. Sorry.