brick / math

Arbitrary-precision arithmetic library for PHP
MIT License
1.78k stars 75 forks source link

Bitwise NOT #33

Closed luiz-brandao closed 3 years ago

luiz-brandao commented 4 years ago

Hi there, first of all, thanks for this great library!

This library has support for bitwise operations and, or and xor, but I found myself needing not. What would be the suggested way of achieving it? Java's BigInteger has a not method. Are we able to have not functionality using any of the existing methods or it would require implementing it?

I know we could do xor with all-1s, or invert the digits of the binary string.

Please advise, thanks!

BenMorel commented 4 years ago

Hi, good point, not was omitted in #16.

After a quick check, I can see that GMP does not have a gmp_not() function either. I guess the problem is that not may not make a lot of sense on an arbitrary-length integer.

Say for example you're inverting 00010101, you'll get 11101010. But if you invert 00000000 00010101 (which is the same number as far as BigInteger is concerned), you'll get 11111111 11101010, which is a totally different number.

As such, this may not be a desirable addition to BigInteger?

BenMorel commented 4 years ago

Note that Java's BigInteger does have a not() method:

https://docs.oracle.com/javase/8/docs/api/java/math/BigInteger.html#not--

I'd be interested to dig into what it returns. I would suspect this performs the operation on a whole number of bytes, this number being the smallest number that fits the value.

luiz-brandao commented 4 years ago

I have found this implementation in another library: https://github.com/pear/Math_BigInteger/blob/trunk/Math/BigInteger.php#L2890

luiz-brandao commented 4 years ago

I'm actually trying to replicate RPG's %BITNOT, so not sure it's the same as Java's. I will see if I can do some tests.

BenMorel commented 4 years ago

%BITNOT seems to operate on fixed size numbers, so with no ambiguity.

Doing a NOT on an arbitrary-size integer will always lead to ambiguity: there is no unique answer.

If anything, I'd prefer to be compatible with Java's BigInteger. Hopefully this would fulfil your needs; but I'm not sure I want to go this route.

Otherwise, the solution would maybe be to create a fixed-size integer class, like FixedBigInteger, that would have a predefined length in bytes.

luiz-brandao commented 4 years ago

Thanks for the feedback so far, I will have to take some time to analyse it with the team now. Merci.

BenMorel commented 4 years ago

Any update here?

luiz-brandao commented 4 years ago

Hey there, good timing! We put the %BITNOT implementation on the side for now but you were right. We are having to deal with fixed length numbers in our project, since we are porting many algorithms from RPG.

For example, right now we are working on porting algorithms that work with the concept of RPG/Cobol "data structures", which are structs that can share memory space and remap those spaces to different types; evidently the size in bytes is something we have to take into account.

We are planning to apply the concept of byte streams to work around that. Fixed types, like FixedBigInteger could be an interesting tool for us and I may be able to help developing them if you feel they could be a good addition to the library.

BenMorel commented 4 years ago

Thanks for the feedback!

Meanwhile I checked Java's implementation, and it looks like they're doing bit arithmetic on the minimum number of bits that fit the number. The result is a negative number. I don't know if there's any use case for this, and I prefer not to venture into an implementation until someone proves to me that this use case exists.

That being said, I'd be happy with a FixedBigInteger, with an arbitrary size of $n bits, and implement bitwise operations, including NOT, there!

BenMorel commented 4 years ago

Something just came up to my mind, we could also pass the number of desired bits in not:

function not(int $bits) : BigInteger

But I still consider it more or less a hack. Also, that doesn't tell us if the result should be a negative or positive number, so I guess it makes more sense to create explicit classes like FixedSignedBigInteger and FixedUnsignedBigInteger, that will leave no ambiguity as to whether the sign of the result should change!

luiz-brandao commented 4 years ago

Yes, I agree fixed types and explicit signed/unsigned ones make more sense.

Regarding our memory mapping problem, I have been playing with the nelexa/buffer package and the concept of buffers has been working well so far during prototyping. If we could introduce fixed types, it would allow us to get their byte representation nicely into the buffer.

BenMorel commented 3 years ago

I just checked Java's BigInteger.not() implementation in more detail, and the not() operation always returns a number that fits in the same number of bytes as the original number, i.e. if the number fits in 10 bytes, the result will be the two's complement on 10 bytes.

This can be implemented as:

public function not() : BigInteger
{
    return self::fromBytes(~ $this->toBytes());
}

But this is actually equal to the negated number, minus 1. As such, the implementation can be simplified as:

public function not() : BigInteger
{
    return $this->negated()->minus(1);
}

Both of these implementations return the same results as Java's BigInteger does. @luiz-brandao-jr could you please tell me if this implementation would be of any use for your use case?

I would be tempted to implement this not() method right now, to be consistent with Java, but I'd be curious to know if there are any real-life use cases for this one!

BenMorel commented 3 years ago

Say for example you're inverting 00010101, you'll get 11101010. But if you invert 00000000 00010101 (which is the same number as far as BigInteger is concerned), you'll get 11111111 11101010, which is a totally different number.

Actually, wrapping my head around that, I'm starting to realize my mistake. In two's complement representation, 11111111 11101010 is the same number as 11101010 (extra leading 1s are to negative numbers what leading 0s are to positive numbers).

So the number of bits on which you're working does not matter: whatever the number of bits, the bitwise NOT of a number will always be equal to (- number - 1), and Java's implementation makes perfect sense.

BenMorel commented 3 years ago

Implemented as BigInteger::not() and released as 0.9.1! 🎉

luiz-brandao commented 3 years ago

Hi @BenMorel that's great news. I think we will definitely have a use case for this one.

We have a constraint right now where we are running PHP compiled at 32-bit, so some numeric values coming from the database could overflow an integer. Although rare, we could also potentially face the same issue with PHP 64-bit since we have some database fields that could store up to 8-byte unsigned integers. This is the reason we started using BigIntegers. And so it happens that some of the algorithms we are porting use bitwise functions!

luiz-brandao commented 3 years ago

@BenMorel one thing that might be an issue for us is that sometimes we use a BigInteger to represent an unsigned integer. This implementation will always return a signed value. I guess we will need to do some manipulation on our end to go around that.

BenMorel commented 3 years ago

@luiz-brandao-jr How do you get your value out of the database? As a numeric string? Can you please give an example of where using a signed BigInteger is problematic?

luiz-brandao commented 3 years ago

@BenMorel yes, we get the value as a numeric string from the database but we can also get data from byte streams/buffers.

Example: Let's say we get a 2-byte unsigned integer from a stream, let's say 7 (this is to make it simple, but in our case it could go up to 8-byte numbers). If we use BigInteger::of('7')->not() we get -8, but with a 2-byte unsigned it should be 65528 (11111111 11111000). But fundamentally it's because we are dealing with fixed sized values, and it's probably up to us to manipulate it.

BenMorel commented 3 years ago

it's probably up to us to manipulate it.

It probably is indeed. But it shouldn't be that hard: just add 2^(number of bits) if you get a negative result:

-8 + (2 ^ 16) = 65528
BenMorel commented 3 years ago

but we can also get data from byte streams/buffers.

By the way, in the latest version you have a fromBytes() factory method that can read binary data out of a string, either signed or unsigned!

luiz-brandao commented 3 years ago

Great, I will take a look at fromBytes. Merci.