gavinhoward / bc

An implementation of the POSIX bc calculator with GNU extensions and dc, moved away from GitHub. Finished, but well-maintained.
https://git.gavinhoward.com/gavin/bc
Other
145 stars 29 forks source link

Incoherent rambling followed by a suggestion for an improvement. #69

Closed TediusTimmy closed 11 months ago

TediusTimmy commented 11 months ago

Greetings,

I am going to start with some pleasantries and BS and work from there.

Hello. I also like bc. I have some bc-related repos: https://github.com/TediusTimmy/GNU_bc_with_GMP and https://github.com/TediusTimmy/OpenBSD_bc_with_GMP . The first one is an example of how fast bc can be, if we just use GMP. I also fixed as many of the crashes as I was aware of (admittedly, by fixing the back-end to be more robust rather than fixing the garbage generated by the front-end; if you find a crash, please let me know). In the second repo, I have a comparison of the speed of several implementations of bc (based on my own benchmark), including yours. I found it to be roughly comparable to, but still slightly slower than, OpenBSD's bc (except on one specific task, where it even beat GMP). I also wrote a spreadsheet program to use bc-like numbers: https://github.com/TediusTimmy/BC-DeciCalc (which I really need to work on the documentation for).

As for those bug-fixes in GNU bc: I really haven't been able to get ahold of Phil, or Ken. Thankfully, there were some really helpful people at Debian, and maybe one day, either https://salsa.debian.org/debian/bc/-/merge_requests/4 will be merged in, or they will just use a better bc implementation, like yours.

Finally, are you aware of this: https://www.php.net/manual/en/book.bc.php ? They have forked the GNU bc code for number.c and include it in their builds. Maybe they could use the library version of your code? (My only concern would be https://bugs.php.net/bug.php?id=66364 )

On to the issue: I'm not a big fan of your p(x,y) function. Let me explain by way of a contrived example: p(1024,32.1). Now 1024 is 2^10, so this SHOULD give us 2^10^32.1 which is 2^(10*32.1) or 2^321. And that's an integer, so it should be exact. But, when we do this we find that only the first 18 digits are correct with the default scale of 20. As you manipulate the scale variable, you find a correlation between the scale variable and the number of correct digits. So, in order to compute this result to 20 digits of scale, you would need to compute the log and exponential to around 117 digits of scale. I don't think that I'm out of line to expect that when you increase the scale of a computation that maybe the last three digits change and you then get extra new good digits. With this implementation of p(x,y), a good chunk of the number changes.

As I was thinking about this, four cases came up: 1) If we are raising a number greater than one to a positive power, then we want to bump the scale by the length of the integer part. 2) If we are raising a number greater than one to a negative power, then we can probably use e(y*l(x)) as that result goes to zero. 3) Conversely, if we are raising a number less than one to a positive power, then we can probably use e(y*l(x)) as the result goes to zero. 4) Finally, if if we are raising a number less than one to a negative power, then we need to be extra careful. We need to use the reciprocal to find the integer part, but want the reciprocal to the increased scale to have an accurate result (in my testing, it removed a problem in the unit in the last place).

So, my final function to improve p(x,y), then commentary:

define pow(x,y){
    auto a,i,s,z
    if(0==y)return 1@scale
    if(0==x){
        if(y>0)return 0
        return 1/0
    }
    a=y$
    if(y==a)return(x^a)@scale
    z=0
    if(x<1){
        y=-y
        a=-a
        z=x
        x=1/x
    }
    if(y<0){
        return e(y*l(x))
    }
    i=x^a
    s=scale
    scale+=length(i)
    if(z){
        x=1/z
        i=x^a
    }
    i*=e((y-a)*l(x))
    scale=s
    return i@scale
}

I started with your p(x,y) function and then added code. We begin with some special case handling for zeros, because people will complain (I didn't realize that bc already defines 0^0==1). Next is the detection for the easy case of an integer exponent that you already had. We then handle if x is between zero and one: we save x and proceed with the reciprocal, while negating the exponent. Remember that l(x)==-l(1/x). I specifically ignore if x is negative, because we will eventually call l(x), which will return an erroneous value if x is negative. If the exponent is now negative, we fall back on e(y*l(x)): that number is going to zero anyway. Next, we compute the integral portion of the exponent and use it to get the working scale for the fractional part of the exponent. After that, if we took the reciprocal of x, recompute the reciprocal and the integral part of the exponent at this higher scale to ensure that both are accurate. Penultimately, compute the fractional part of the exponent and multiply it by the integral part to get the complete exponent. Finally, return a result at the desired scale.

gavinhoward commented 11 months ago

Hello.

As for those bug-fixes in GNU bc: I really haven't been able to get ahold of Phil, or Ken. Thankfully, there were some really helpful people at Debian, and maybe one day, either https://salsa.debian.org/debian/bc/-/merge_requests/4 will be merged in, or they will just use a better bc implementation, like yours.

Maybe I'm selfish here, but I think they should just use mine. I tested my bc against all of the items at https://lists.debian.org/debian-devel/2022/08/msg00035.html and https://bugs.launchpad.net/ubuntu/+source/bc/+bug/1775776, and not only does my bc not crash, ASan, UBSan, and Valgrind report no errors, not even memory leaks. (Yay for fuzzing!) If you know how I might submit a request for them to do so, please let me know. I'll even build the Debian package myself!

Finally, are you aware of this: https://www.php.net/manual/en/book.bc.php ? They have forked the GNU bc code for number.c and include it in their builds. Maybe they could use the library version of your code?

I am aware, actually. This was in the back of my mind when I accepted a request from my FreeBSD contact to implement the bcl library. But I don't know how to contact them and convince them to use my library.

(My only concern would be https://bugs.php.net/bug.php?id=66364 )

Unfortunately, the PHP authors are wrong; that behavior is well-documented in the POSIX standard for bc, which says about multiplication:

The result shall be the product of the two expressions. If a and b are the scales of the two expressions, then the scale of the result shall be:

min(a+b,max(scale,a,b))

So their fix for that bug report is wrong, in my opinion. Thus, I don't think they'll adopt my library because I won't patch my library for their "bug."

On to the issue: I'm not a big fan of your p(x,y) function...

Your criticisms are absolutely fair. Fair enough that I took your code (with style fixes), documented it, and pushed it as the better_pow branch in this repo.

Can you take a look to check if the code is correct, that it works for you, and that the documentation I added in manuals/algorithms.md is correct? I documented it in my own words, to make sure I understood the code.

Finally, last item: do you think that you'll open more issues on bc in the future? If so, I'll add an account for you on my website; I just need an email address.

gavinhoward commented 11 months ago

Forgot to say this:

In the second repo, I have a comparison of the speed of several implementations of bc (based on my own benchmark), including yours. I found it to be roughly comparable to, but still slightly slower than, OpenBSD's bc (except on one specific task, where it even beat GMP).

Those benchmarks make sense to me, except plain OpenBSD bc being faster than mine; it uses plain char for digits, so mine should be faster. My guess is that it is parsing without many error checks.

Also, I store constants as strings, like GNU. This is because someone could change the ibase at any time, and if they do, that constant needs to be interpreted according to the new ibase. But I know for a fact that the OpenBSD bc has a different behavior: it will interpret the number according to the digits, no matter if the digits are invalid for the ibase. This allows it to precalculate, where GNU and I cannot.

TediusTimmy commented 11 months ago

Many apologies. I posted that improvement and went straight to bed last night, and then life got in the way of me responding. And I am going to go straight to bed tonight, also.

If you know how I might submit a request for them to do so, please let me know.

Honestly, when it comes to Debian, Philip Hands was a godsend (but, he isn't the maintainer). I tried emailing Phil Nelson and Ken Pizzini and Ryan Kavanagh, but those probably went into spam filters. Eventually, I emailed the debian-devel mailing list and someone who helps first-time contributors (Hands) picked it up and ran with it. Like any large open-source project, it has an inscrutable political organization to outsiders like us. I was lucky that they even have a merge request for the changes (which has been open for over a year).

The real problem is that GNU bc is "good enough". It has vulnerabilities that no one is known to have exploited (if they even are exploitable). People complain, but not loud enough to get things done. It works for its purpose, and doesn't seem to have a large user base.

But I don't know how to contact them and convince them to use my library.

Again: inscrutable political organization. I don't actually program with PHP, I also am just aware of this extension. Looking at this bug report ( https://github.com/php/php-src/issues/10967 ), maybe the internal mailing list or the RFC process is what you want.

the PHP authors are wrong

Yeah. Thought, looking at it again, they could have fixed the issue here ( https://github.com/php/php-src/blob/master/ext/bcmath/bcmath.c#L258 ) instead of where they did. You could convince them that this is the better place for a bodge between their code and the bc library (as the behavior is standards-driven). Though, even Debian has a bug for this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=610988

with style fixes

What did you change? Oh, that. I get that from a former workplace.

it uses plain char for digits

It uses the OpenSSL BIGNUM library, which is binary words ( https://github.com/openbsd/src/blob/master/usr.bin/dc/bcode.h#L24 and https://github.com/openbsd/src/blob/master/lib/libcrypto/bn/bn_local.h#L121 ). For every 64 bits of a number, they are operating on all 64 bits, and you are operating on a little over 63 bits of it and have a normalization step. The trade-off rears its ugly head when numbers get converted to/from decimal.

do you think that you'll open more issues on bc in the future

I don't have any plans. This was a spur-of-the-moment thing. I had "finished" a pow function for the spreadsheet, and decided to rewrite it in bc. (And then, I spent a bunch of time rewriting it again, as I realized that I had missed some corner cases). I don't plan to make a habit of this.

As for reviewing things:

being non-zero is a flag for later and a value to be used

I did that, didn't I? Bad me.

I don't know if the algorithm needs this much detail. It is probably good to give the impression that someone put thought and care into making sure the results were accurate. Though, really it was just some rando who kept doing really large exponents and wanted some numerical stability as he cranked up the precision while looking at numbers like https://www.youtube.com/watch?v=BdHFLfv-ThQ.

TediusTimmy commented 11 months ago

And I forgot to say something: I understand that the frustrating thing about the bcmath extension to PHP is that you have something that is just better. And you have no idea how to make things better.

gavinhoward commented 11 months ago

Eventually, I emailed the debian-devel mailing list and someone who helps first-time contributors (Hands) picked it up and ran with it. Like any large open-source project, it has an inscrutable political organization to outsiders like us.

Yeah, that sounds exhausting. Not worth it to me at the moment.

The real problem is that GNU bc is "good enough". It has vulnerabilities that no one is known to have exploited (if they even are exploitable). People complain, but not loud enough to get things done. It works for its purpose, and doesn't seem to have a large user base.

This is true.

Yeah. Thought, looking at it again, they could have fixed the issue here ( https://github.com/php/php-src/blob/master/ext/bcmath/bcmath.c#L258 ) instead of where they did. You could convince them that this is the better place for a bodge between their code and the bc library (as the behavior is standards-driven).

Eh, probably not worth it.

Though, even Debian has a bug for this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=610988

Yikes. People don't read, do they? Okay, I really don't care about getting into Debian.

It uses the OpenSSL BIGNUM library, which is binary words

Oh, that was not what I understood. Oops. Those benchmarks make sense then.

I don't have any plans. This was a spur-of-the-moment thing.

Okay.

As for reviewing things...I did that, didn't I? Bad me.

Well, that bc code goes straight into the binary as a string. Using z as a flag and a value is useful to reduce the number of bytes used. I kept it for a reason.

I don't know if the algorithm needs this much detail. It is probably good to give the impression that someone put thought and care into making sure the results were accurate.

I need that much detail. I have to understand every bit of code in this repo.

Take, for example, the comment here. That was me explaining, to myself, an algorithm that someone else coded up. If there was a bug in that algorithm when I got it (and there was), there was no way for me to debug it unless I knew what it was supposed to be doing.

If there is a bug in your code, I need to understand your code to debug it.

That's why there is a wealth of information for developers; it's for me.

So I'm glad to know that the explanation is correct, but I'll keep all of it. :)

I may wait a while to release a new version (I have a new PR about adding CMake support), but I will release a new version with your code within a reasonable time.

Thank you.

gavinhoward commented 10 months ago

There is now a version (6.6.1) with your p() implementation. Thank you very much!

TediusTimmy commented 10 months ago

You are welcome. I hope it is as useful for your other users as it is to me and not a maintenance burden on you.

gavinhoward commented 10 months ago

None at all now that I understand it, thanks to you!